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Introduction 


1.1  Introduction 

I'hc  goal  of  the  research  reported  in  tliis  diesis  is  to  model  the  performance  ol"  die  Concert 
Multiprocessor  (A2|  in  order  to  answer  the  following  questions: 

1.  What  is  the  performance  of  die  system  as  designed  and  built  with  respect  to  some  metric? 

2.  Wily  is  die  performance  as  it  is?  What  factors  influence  die  performance,  what  is  die  sensi¬ 
tivity  of  the  performance  to  these  factors,  and  what  arc  the  limitations  of  die  system  design? 

3.  How  can  the  performance  he  improved  and  where  should  the  design  he  modified  to  achieve 
this  improvement?  What  arc  die  critical  sections  and  bottlenecks  in  die  design? 

An  answer  to  die  first  question  satisfies  a  natural  curiosity;  an  answer  to  the  second  gives 
users  ideas  how  to  structure  programs  and  applications  to  achieve  the  best  possible  performance  of 
die  Concert  system;  and  finally,  an  answer  to  die  diird  indicates  how  to  achieve  better  perfor¬ 
mance  in  future  designs.  Another  outcome  of  the  work  described  herein  is  that  it  provides  a  start¬ 
ing  point  for  future  modeling  efforts.  The  experience  and  knowledge  gained  through  this  research 
can  be  used  to  guide  die  development  and  applications  of  higher  level  and/or  more  complex 
models. 

The  performance  metric  used  in  tliis  research  is  die  dirougliput  of  the  system.  This  metric  is 
simple  and  yet  represents  the  basic  goal  of  multiprocessor  systems.  However,  throughput  is  a 
rather  crude  metric  to  use  when  comparing  the  performance  of  different  systems  because  struc¬ 
tural  and  organizational  differences  oflcn  cause  the  definition  of  throughput  to  differ,  l-'or- 
ttinatcly,  die  main  use  of  throughput  in  this  thesis  is  to  gauge  the  change  in  performance  due  to 
variations  in  the  parameters  of  the  system  or  due  to  small  modifications  of  the  system. 
Throughput  is  well  suited  for  this  kiud  of  study. 


18 


Introduction 


I  he  sysiem  is  modeled  al  ihe  memory  access  level  and  thus  throughput  in  this  case  is  the 
average  number  of  memory  accesses  per  unit  lime.  latch  processor  is  assumed  to  spend  most  of  its 
time  accessing  its  associated  locai  memory.  (Organization  of  the  system  will  be  discussed  in  detail 
in  the  next  chapter.)  I  he  processor  model  employed  is  the  simplest  imaginable  in  such  a  case:  die 
processor  spends  some  lime  performing  local  processing  aRcr  which  it  makes  a  non-local  memory 
access  (for  which  it  may  have  to  wail  for  bus  mastership)  and  then  resumes  local  processing.  Ihe 
operation  of  each  processor  is  assumed  to  be  independent  of  that  of  all  other  processors.  The  rea¬ 
sons  for  the  memory  access  modeling  level,  the  simple  processor  model,  and  the  assumption  of 
independent  processors  arc  the  same:  at  diis  point  in  time  not  enough  is  known  about  the 
languages,  programming  models,  programs,  and  applications  to  obtain  more  detailed  models, 
f  urthermore,  the  Concert  system  is  designed  to  be  a  testbed  for  the  examination  of  many  dif¬ 
ferent  multiprocessor  ideas.  The  common  denominator  of  all  Concert  applications  is  the  system 
itself  and  that  is  where  this  research  is  focused.  ITte  basic  premise  of  this  research  is  to  start  with 
sonic  very  simple  models,  develop  them  fully,  evaluate  them,  and  then  determine  how  the  models 
can  be  improved.  Complexity  is  always  easy  to  add  to  models,  sometimes  to  the  point  that  they 
become  unwieldy,  hut  it  is  more  difficult  to  add  complexity  in  such  a  way  that  keeps  the  models 
simple  but  accurate.  Thus  this  thesis  should  be  considered  as  the  first  step  in  an  iterative  cycle  to 
obtain  models  incorporating  additional  features  such  as  proccssoi  dependencies,  language  issues, 
and  programming  models. 

Because  of  the  size  am!  complexity  of  the  Concert  Multiprocessor,  direct  modeling  of  the 
system  -  even  with  the  simple  processor  mode!  -  would  be  a  formidable  task.  Ihe  approach  taken 
in  the  sequel  is  to  decompose  die  system  into  subsystems  along  the  lines  of  die  system’s  natural 
hierarchies,  l  acli  subsystem  is  analyzed  in  detail  and  then  all  the  subsystem  models  arc  integrated 
to  determine  the  performance  of  die  total  system.  Analytical  models  arc  used  for  each  subsystem. 
The  functional  equations  associated  with  analytical  models  allow  easy  prediction  and  quick  evalua¬ 
tion  of  the  effect  of  various  changes  in  the  model  parameters.  In  short,  they  allow  a  lot  of  ground 
to  be  covered  in  a  structured  manner  and  this  makes  diem  ideally  suited  to  the  (list  step  of  the 
iterative  cycle  described  earlier. 

Simulation  is  employed  in  this  thesis  in  a  few  instances  where  the  analytical  models  become 
intractable  or  unmanageable.  However,  the  main  use  of  simulation  is  to  determine  die  accuracy  of 
the  integrated  models. 

The  research  described  herein  started  when  the  author  joined  the  Concert  Project  just  alter 
construction  had  begun.  Ihus  this  work  in  no  way  affected  die  design  of  the  system  as  dcsciibed 
in  die  next  chapter  and  in  Anderson  |A?1.  I  he  optimum  time  to  begin  modeling  is  during  the 
design  stage.  Unfortunately  only  die  most  rudimentary  (and  flawed^)  simulations  were  performed 

^  In  his  simulat ion  of  (ho  Ringbus  nrbiicr.  Anderson  cienlcd  a  queue  of  requests  for  each  slier  (defined  in  sec¬ 
tion  17)  and  Icrmuiaicd  (he  simulation  when  all  the  queues  emptied  However,  if  a  queue  emptied  and  at  least 
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at  that  time.  Although  conducted  after  the  design  stage,  this  research  is  still  extremely  useful  in 
answering  the  three  basic  questions  posed  earlier. 

Ibis  thesis  is  organized  into  lour  chapters  and  each  chapter  is  divided  into  sections.  Hie 
next  section  in  this  first  chapter  describes  the  Concert  Multiprocessor.  The  section  after  that 
presents  more  details  on  the  modeling  level  and  modeling  strategy.  The  factors  considered  in  this 
study  and  die  assumptions  made  arc  discussed  in  detail.  I  he  final  two  sections  in  litis  chapter 
briefly  discuss  previous  work  in  dtis  area  and  preview  the  following  chapters. 


one  queue  was  still  nonempty  the  simulation  still  ran  and  still  collated  statistics  with  null  requests  tie  the  ab¬ 
sence  of  a  rotuest)  generated  for  each  slice  with  an  empty  queue  Ihus  the  statistics  were  biased  by  the  stream 
of  null  requests  when  a  queue  emptied. 
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1.2  'Hu*  Concert  Multiprocessor* 

Concert  is  a  tightly-coupled.  shared  memory  multiprocessor.  It  consists  of  multiple  proces¬ 
sors,  each  executing  portions  of  code,  communicating  through  shared  memory  to  cooperate  on  the 
solution  of  a  large  task  (or  tasks).  It  is  classified  as  a  multiple  instruction  stream,  multiple  data 
stream  (MIMD)  computer  (1*1]. 

Ihc  Concert  Multiprocessor  consists  of  a  hierarchy  of  time-shared  (i.c.  circuit-switched) 
buses.  At  the  top  level,  eight  slices  arc  interconnected  by  bus  segments  as  shown  in  figure  1.1. 


Itus  Segments 


figure  1.1:  Top  level  view  of  Concert 

Circuitry  within  each  slice  connects  the  two  adjacent  bus  segments  cidicr  to  different  internal  slice 
resources  or  to  each  other  so  that  all  internal  slice  resources  arc  bypassed.  An  electrical  connection 
can  be  established  from  a  resource  within  one  slice  -  the  source  -  to  a  resource  within  a  different 
slice  -  the  destination  -  by  an  appropriate  connection  of  the  bus  segments  within  the  source  and 
destination  slices  and  by  joining  the  bus  segments  together  in  all  slices  between  the  source  and 
destination,  1-ach  bos  segment  is  bidirectional,  tints  source  and  destination  slices  may  be  connected 
by  a  path  in  either  the  clockwise  or  the  counterclockwise  directions.  More  than  one  souitc- 
dcsti nation  connection  can  he  supported  simultaneously  provided  that  1)  there  is  a  contiguous  con¬ 
nection  of  segments  from  each  source  to  its  destination,  and  2)  each  bus  segment  and  each  slice 
resource  is  allocated  to  at  most  one  source-destination  connection.  Various  simultaneous  connec¬ 
tions  arc  depicted  in  figure  1.2. 


*  Only  the  details  of  ihc  design  which  arc  fell  lo  be  relevant  to  the  modeling  effort  in  the  sequel  arc  discussed 
here  Sec  Anderson  |A?.|  for  more  complete  information. 
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Figure  1.2:  An  example  of  simultaneous  connections  on  Ringbus 

Note  that  a  maximum  of  eight  simultaneous  connections  can  he  supported  (e.g.  if  each  slice  and  its 
immediate  clockwise  neighbour  comprise  a  source-destination  pair).  Once  a  connection  is  esta¬ 
blished  from  source  to  destination,  that  connection  is  maintained  and  all  die  resources  involved  in 
that  connection  remain  allocated  to  only  dial  connection  until  the  source  slice  no  longer  requires 
the  connection.  A  central  arbiter,  shown  in  Figure  1.1,  controls  die  allocation  and  connection  of 
die  bus  segments.  The  ring  of  bus  segments  shown  in  Figures  l.l  and  1.2  is  called  the  Ringbus.  • 

litich  slice  consists  of  up  to  eight  processor- local  memory  pairs  (one  local  memory  block  per 
processor  is  the  usual,  but  not  necessary,  configuration),  a  global  memory  block,  a  time-shared  bus 
called  the  Multibus,  and  a  Ringbus  Interface  Board  (RIB).  I  ach  processor  communicates  with  its 
local  memory  over  a  dedicated  bus  called  the  high  speed  bus  (MSB).  This  bus  is  private  to  the  pix>- 
cessor  and  independent  of  die  Multibus  and  other  high  speed  buses.  All  die  processors  and 
memories  (both  local  and  global)  arc  also  connected  to  die  Multibus.  Ihc  Multibus,  global 
memory  (via  a  HSB),  and  die  Ringbus  segments  adjacent  to  that  slice  connect  to  the  RIB.  Various 
access  paths  and  circuitry  inside  die  RIB  (described  in  section  1.2.2)  allow  diesc  items  to  be  inter¬ 
connected.  The  resources  of  a  slice  dial  arc  available  for  interslice  communication  can  be  divided 
into  two  mutually  exclusive  groups:  source  resources  and  destination  resources,  flic  processors 
connected  to  the  Multibus  arc  die  only  source  resources.  The  destination  resources  consist  of  die 
global  memory  and  some  global  registers  (which  arc  inside  the  RIB). 

Only  three  types  of  communication,  all  originated  by  processors,  can  occur  in  the  Concert 
Multiprocessor.*  A  processor  can  communicate  -  i.c.  access  -  its  local  memory  via  the  USB,  the 
local  memory  of  other  processors  on  its  slice  and  the  global  memory  of  its  slice  via  the  Multibus, 
and  die  global  memory  of  other  slices  (and  the  global  registers  of  its  slice)  via  the  Multibus  and 

*  Communication  can  also  be  originated  by  oilier  potential  Multibus  masicrs.surli  as  I/O  devices  However, 
these  oilier  potential  masters  essentially  liehavc  like  processors 
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Ringbus.  Wc  term  these  types  of  accesses  IISU,  Multibus,  and  Ringbus  accesses  respectively.  Note 
that  a  processor  can  not  communicate  directly  with  other  processors  or  the  local  memory  of  pro¬ 
cessors  on  other  slices;  such  communication  must  occur  through  die  local  or  global  memory.  All 
bus  transactions  in  Concert  are  single  memory  transactions  -  read,  write,  or  read-modify-writc. 
Successive  accesses  require  establishment  of  direct  bus  connections  from  source  processor  to  desti¬ 
nation  memory  for  each  access.  Ihus  there  is  no  store  and  forward  mechanism  or  anything  of  this 
kind  on  die  Ringbus  or  elsewhere. 

The  structure  of  a  four  slice  version  of  die  Concert  Multiprocessor  is  illustrated  in  figure  1.3. 
I  bis  figure  shows  all  major  interconnections  within  Concert  and  illustrates  some  representative 
itccesscs  from  each  of  the  Uircc  types  of  accesses. 


now  discussed  in  more  detail. 
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1.2.1  Multibus 

The  Multibus  is  an  IKFT.  7%  standard  multi-master  bus.  An  additional  bus,  which  ntns 
parallel  to  this  7%  bus.  is  physic  ally  divided  into  shorter  independent  bus  segments  each  of  which 
serves  as  the  high  speed  bus  for  a  processor.  ITtc  processors  and  memories  arc  commercially 
available  dual-ported  boards  (Microbar  Inc.  products  DI1C68K  and  DlillSO  respectively)  that  each 
have  one  Multibus  and  one  HSU  port.  As  described  earlier  Lite  HSU  is  private  to  a  processor;  thus 
there  is  only  one  processor  per  HSU.  The  processors  arc  based  on  the  Motorola  MC68000 
microprocessor. 

When  a  memory  access  is  initiated,  a  processor  first  attempts  to  access  the  desired  location 
on  die  HSU.  If  this  attempt  is  successful,  the  memory  access  proceeds.  If  it  is  not  successful,  the 
processor  accesses  the  location  via  die  Multibus.  Huts  a  pioccssor  accesses  its  own  local  memory 
over  its  I  ISIS  and  the  local  memory  of  other  processors  or  global  memory  over  the  Multibus. 
Accesses  on  the  HSU  take  considerably  less  time  dien  accesses  on  die  Multibus  due  to  the  differ¬ 
ences  between  die  I ISU  and  Multibus  protocols. 

Contention  for  the  mastership  of  die  Multibus  is  resolved  by  a  round-robin  arbitration  unit, 
lliis  unit  takes  a  maximum  of  two  Multibus  clock  cycles  (10  Mil/,  clock)  as  pictured  in  Figure  1.4. 
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Figure  1.4:  Multibus  arbitration  signals 

One  cycle  is  required  to  latch  die  request  lines  and  another  is  required  for  the  arbitration  and  pro¬ 
pagation  delay.  This  arbitration  unit  grants  possession  of  the  bus  to  a  processor  for  only  as  long  as 
it  takes  to  complete  a  single  memory  access,  which  cannot  exceed  16  bits.  Hie  68000  can  perform 
byte  (8  bit),  word  (16  bit),  and  long  word  (J2  bit)  operations.  Long  word  operations  consist  of 
two  separate  16  bit  accesses;  dius  a  processor  must  gain  control  of  the  bus  twice  for  a  long  word 
access.  Other  processors  may  sci/e  the  bus  between  these  two  accesses. 

Contention  also  exists  for  local  memories  since  a  local  memory  can  be  addressed  simultane¬ 
ously  over  a  processor's  HSU  and  over  die  Multibus.  Hi  is  contention  is  resolved  by  arbitration  cir¬ 
cuitry  on  the  dual-ported  memory  hoards. 
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1.12  Kill 

When  the  Rill  recognizes  a  memory  access  on  the  Multibus  in  the  Ringhus  address  space 
(i.c.  a  Ringhus  access),  it  decodes  the  destination  slice  from  the  address  of  the  across  and  sends  a 
request  to  the  Ringhus  arbiter  for  a  connection  between  die  Multibus  of  die  source  slice  and  the 
destination  slice.  When  die  Ringhus  arbiter  grams  the  request,  it  directs  some  number  of  Rllis  to 
form  a  padi  between  die  source  and  destination  and  then  it  lets  die  memory  access  at  the  source 
slice  proceed. 

A  diagram  of  the  access  padis  within  the  RIB  is  shown  in  figure  1.5.  Arrows  denote  the 
directionality  of  the  paths  and  lines  perpendicular  to  a  path  denote  a  switch  which  can  be  either 
open  or  closed. 


counterclockwise 
Ringhus  segment 


Ringhus  segment 

figure  1.5:  RIB  access  paths 

Notice  dial  the  Ringhus  access  paths  arc  asymmetrical.  Memory  accesses  enter  the  Ringhus 
oil  die  segment  to  the  clockwise  direction  of  die  source  RIB  and  exit  via  the  Ringhus  segment  to 
the  counterclockwise  direction  of  the  destination  RIB.  'Hits  causes  die  Ringhus  to  be  biased 
toward  memory  accesses  in  the  clockwise  direction  around  the  Ringhus.  As  depicted  in  figure  1.6. 
a  memory  access  to  a  nciglibouiing  RIB  in  the  clockwise  direction  requires  one  Ringhus  segment 
compared  to  three  for  die  neighbouring  slice  in  the  the  counter  clockwise  direction.  (  Ih is  last 
access  could  also  be  made  in  die  clockwise  direction,  for  a  Ringhus  with  eight  segments,  this 
would  require  seven  segments.) 
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Figure  1.6:  Access  paths  to  neighbouring  RIB 

The  asymmetrical  acccvs  paths  clearly  reduce  the  maximum  number  of  accesses  that  can 
occur  simultaneously  on  the  Kingbus  if  any  of  the  accesses  take  place  in  the  counter  clockwise 
direction.  The  designers  of  the  Concert  system  fell  that  the  asymmetrical  access  paths  would  sim¬ 
plify  die  kingbus  arbiter  (see  section  5.2.2  in  Anderson  [A2j). 

The  same  dual-ported  memory  boards  used  for  the  local  memories  on  the  Multibus  arc  used 
for  the  global  memories.  As  indicated  in  Figures  1.3  and  1.5.  the  Multibus  port  of  the  global 
memory  connects  directly  to  the  to  the  Multibus  of  that  slice.  ITic  IIS  1 1  port  of  the  global  memory 
connects  to  die  kingbus.  As  before,  arbitration  circuit i  y  on  die  global  memory  board  handles 
simultaneous  Multibus  and  kingbus  accesses  to  that  board.  Note  that  all  accesses  to  global 
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memory  require  some  portion  of  the  Ringbus.  except  for  accesses  to  the  global  memory  in  the 
same  slice  as  the  processor  making  die  access. 

There  are  also  a  small  number  of  global  registers  located  in  the  RIB  (they  arc  not  shown  on 
any  of  die  T'igurcs)  for  the  purpose  of  various  sundry  activities  such  as  resetting  the  slice,  inter¬ 
rupting  processors  in  die  slice  from  a  processor  external  to  die  slice,  enforcing  read  and/or  write 
protection  on  the  slice's  global  memory,  and  some  limited  performance  monitoring.  Ihcsc  regis¬ 
ters  arc  accessed  in  die  same  manner  as  the  global  memory  except  that  a  slice  cannot  access  its 
global  registers  directly  from  die  Multibus.  All  global  register  accesses  require  the  Ringbus. 

1.2.3  Ringbus  Arbiter 

The  arbiter  uses  a  rotating  priority  scheme  to  ensure  that  all  requests  eventually  get  granted. 
If  die  slices  arc  numbered  consecutively  from  0  to  S  - 1.  where  .S'  is  the  number  of  slices,  then  die 
priority  of  slice  i  is  pri(i)  (/  -«)  mod  S  where  n  is  die  current  top  priority  slice.  A  request  is 
held  at  die  top  priority  until  it  is  granted  at  which  time  n  is  updated  to  the  next  slice  in  the  coun¬ 
terclockwise  direction  dial  has  a  pending  request.  A  number  of  algorithms  may  be  used  to  grant 
any  combination  of  lower  priority  requests  that  do  not  conflict  with  each  other  or  with  any  grants 
(i.c.  memory  accesses)  in  progress.  The  particular  algorithm  used  in  this  case  grants  a  request  only 
if  it  docs  not  conflict  with  any  requests  at  higher  priority  levels  or  grants  in  progress.  Only  dte 
direction  requiring  the  smallest  number  of  Ringbus  segments  is  considered  for  granting  the 
requests.  In  the  case  of  a  tie  in  die  number  of  segments  required  in  clockwise  and  counterclock¬ 
wise  directions,  the  clockwise  direction  is  chosen. 

The  arbiter  incorporates  a  clever  design.  'The  Ringbus  segments  required  for  each  request 
arc  determined  from  the  destination  of  the  request.  Since  requests  arc  only  granted  in  one  direc¬ 
tion  as  mentioned  earlier,  dicrc  is  no  ambiguity  in  determining  which  segments  arc  required.  Hath 
Ringbus  segment  is  provisionally  granted  to  a  request.  The  request  to  which  a  particular  segment 
is  granted  is  determined  by  the  priority  of  the  requests.  When  a  request  has  been  granted  all  die 
segments  that  it  requires,  die  request  is  granted. 
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Figure  1.7:  I. ogic  diagram  of  arbiter 


Introduction  29 

A  logic  diagram  of  the  arbiter  is  presented  in  bigurc  1.7.  Hie  SN  Rom  determines  the  seg¬ 
ments  lequired  for  each  request.  Ilach  of  Die  SO  Roms,  one  for  each  segment,  determines  the 
request  to  which  that  segment  is  granted.  The  SG  Roms  automatically  grant  a  segment  to  all 
requests  that  do  not  require  it.  'Ibus  the  eight  segment  grant  lines  just  need  to  be  ANDed  to 
determine  if  the  request  has  all  the  required  segments.  To  prevent  a  "request"  from  being 
granted  when  there  is  in  fact  no  request,  die  grant  line  is  ANDed  with  die  request  line.  Some 
additional  logic  bypasses  die  SG  Roms  to  prevent  the  interconnection  of  die  required  segments 
from  being  changed  while  a  grant  using  diem  is  still  in  progress. 


Request  Signal. 
Cirant  Signal. 


Arbiter  dock 


.Request 

latched 


f — Arbitrate* 


Kcqucsl 
/  granted 


-Decode  output"* 


h'igurc  1.8:  Ringbus  arbiter  timing 


ITtc  arbitration  time  for  this  arbiter  is  between  two  and  dircc  arbiter  clock  cycles.  As  indi¬ 
cated  in  l-igurc  1.8.  once  the  requests  arc  latched  into  the  arbiter,  one  cycle  is  required  for  the 
arbitration  and  another  cycle  is  required  to  decode  and  latch  the  grant  lines. 
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1.3  Modeling  Details 
1.3.1  Processor  Model 

We  assume  a  simple  probabilistic  model  for  each  processor  based  on  accesses  to  non- local 
memory  (i.c.  those  memory  locations  which  a  processor  can  only  access  via  the  Multibus).  We 
partition  the  operation  of  a  processor  into  three  phases:  1)  processing.  2)  waiting,  and  3)  access¬ 
ing.  (We  add  a  fourth  phase  later.)  The  processing  phase  corresponds  to  the  interval  between  the 
completion  of  tine  memory  access  via  the  Multibus  and  the  request  for  the  next  memory  access  via 
the  Multibus.  (A  processor  must  request  the  Multibus  and  be  granted  its  use  by  the  Multibus 
arbitration  circuitry  before  a  memory  access  may  proceed.)  Only  local  (i.e.  IISII)  memory  accesses 
may  occur  during  this  interval.  We  consider  the  instructions  for  each  processor  to  be  stored 
mainly  in  its  local  memory.  '11ms  we  regard  the  operation  of  a  processor  as  consisting  of  periods 
of  processing  (hence  the  name  processing  phase),  where  the  processor  is  accessing  instructions  and 
data  stored  entirely  within  its  local  memory,  punctuated  by  accesses  to  global  memory  for  data  and 
other  instructions. 

Ihc  waiting  phase  corresponds  to  the  interval  between  the  generation  of  a  Multibus  request 
and  the  initiation  of  the  access  corresponding  to  that  request.  A  Multibus  memory  access  from 
one  processor  may  have  to  wail  for  die  completion  of  other  Multibus  accesses  before  it  can  begin. 
The  accessing  phase  corresponds  to  die  interval  during  w  hich  a  Multibus  access  is  in  pi  ogress  by 
that  processor:  it  is  the  entire  duration  for  which  Die  processor  maintains  uninterrupted  mastership 
of  the  Multibus.  These  three  phases  correspond  to  the  operation  of  a  processor  from  die  point  of 
view  of  the  Multibus. 

The  interval  for  which  a  processor  is  in  die  processing  phase  we  call  die  processing  time, 
denoted  by  i/t\  the  interval  for  which  a  processor  is  in  the  waiting  phase  we  call  die  wailing  time 
for  a  memory  request,  denoted  by  iw\  and  die  interval  for  which  a  processor  is  in  the  accessing 
phase  we  call  the  access  lime,  denoted  by  i„.  One  cycle  of  a  processor,  consisting  of  these  three 
times,  is  depicted  in  figure  1.9. 

Processing  lime  Waiting  lime  Access  time 


Figure  1.9:  One  cycle  of  a  processor 

More  precise  definitions  of  (p,  in,  in  in  terms  of  Multibus  signals  arc  given  in  section  2  of 
Appendix  A.  The  waiting  time.  /„ .  is  defined  so  that  it  is  always  zero  when  there  is  only  one 
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processor  on  a  Multibus.  Ilie  delay  of  the  Multibus  arbitration  circuitry  is  included  in  the  access 
lime. 

We  consider  tp,  tw.  and  ta  to  be  random  variables.  ip  and  la  have  given  probability  distri¬ 
butions  which  serve  as  inputs  to  the  processor  model.  The  probability  distribution  of  /*,,  which  is 
determined  by  the  contention  for  use  of  the  Multibus,  is  the  output.  Given  that  a  processor  gains 
mastership  of  the  Multibus  for  a  memory  access,  we  assume  that  the  access  requires  use  of  the 
Kingbus  with  probability  tp,  in  which  case  we  call  it  a  Ringbus  access,  and  that  it  requires  use  of 
only  the  Multibus  with  probability  1  -  tp,  in  which  ease  we  call  it  a  Multibus  access.  Given  that  a 
Ringbus  access  occurs,  we  assume  that  its  destination  is  the  global  memory  or  a  global  register 

o  u 

connected  to  Ringbus  slice  i  with  probability  />“  .  /'-  (S/2-- 1).  ■  •  • .  -  1,  1.  2,  •  •  •  ,  or  S/2. 

I  lie  number  of  slices  is  S  and  /  denotes  the  position  of  a  slice  with  respect  to  die  one  from  which 
die  access  originates.  Negative  numbers  indicate  die  counterclockwise  direction,  positive  numbers 
indicate  the  clockwise  direction  around  die  Ringbus  relative  to  the  slice  originating  die  access. 
Thus  /  -  2  indicates  the  second  slice  along  die  Ringbus  in  the  counterclockwise  direction  from 

the  slice  originating  die  access  and  /  -2  indicates  the  second  slice  in  die  clockwise  direction.  We 
call  the  set  of  p,RH  the  Ringbus  destination  probabilities.  Since  in  most  applications,  accesses  to 
die  global  registers  will  be  infrequent,  we  ignore  accesses  by  a  processor  to  die  global  registers  in 
its  own  slice.  We  assume  that  a.II  Ringbus  accesses  have  the  same  access  time  distribution  and  that 
all  Multibus  accesses  have  the  same  access  time  distribution  (which  in  general  will  differ  from  that 
for  Ringbus  accesses).  The  Ringbus  access  time  distribution  is  an  equivalent  model  of  the  entire 
Ringbus  from  die  perspective  of  the  Multibus  (we  talk  about  this  more  in  section  1.2.5):  it  includes 
any  wailing  time  imposed  on  a  Ringbus  access  by  the  Ringbus  arbiter. 

We  have  just  assumed  dial  all  Multibus  accesses  have  die  same  distribution.  We  now  exam¬ 
ine  diis  assumption  in  more  detail.  In  die  absence  of  traffic  on  the  MSB  ports  of  die  global 
memory  boards,  all  Multibus  accesses  would  actually  have  the  same  access  time  distribution. 
However,  since  the  boards  arc  dual-ported,  traffic  on  one  port  of  a  memory  board  affects  traffic 
on  the  other  port.  Thus  Multibus  accesses  may  have  different  access  time  distributions  depending 
on  the  memory  board  accessed  and  the  traffic  intensity  on  die  board's  HSR  port.  Ilicrc  arc  two 
different  eases  to  consider  depending  on  the  destination  of  a  Multibus  access. 

Case  1:  flic  destination  is  a  local  memory,  in  which  case  some  processor  connects  to  die 
HSU  port  of  the  memory  board.  In  this  ease  die  Multibus  access  time  can  be  greatly  affected  by 
die  USD  traffic  on  the  local  memory  board  from  the  processor  -  compare  Figures  A.4  and  A. 5  in 
Appendix  A. 

Case  2:  The  destination  is  a  global  memory.  In  this  case  the  MSI)  port  may  either  be 
unconnected  or  connected  to  the  RIB.  A  comparison  of  Figures  A.4  and  A.6  reveals  that  the 
access  time  is  essentially  the  same  for  these  two  choices  of  USB  connections. 
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We  conclude  lli.il  if  the  niajiiity  of  Multibus  accesses  arc  to  global  memory,  then  the  access 
time  distribution  is  essentially  the  same  for  every  access  as  we  assumed  earlier,  Finally.  we  note 
that  a  comparison  of  Figures  A.9  and  A.  10  in  Appendix  A  reveals  dial  Ringlms  access  times  are 
only  slightly  affected  by  the  traffic  intensity  on  the  Multibus  port  of  a  global  memory  board. 

We  assume  that  reads  and  writes  have  the  same  access  lime  distribution.  Ihis  assumption  is 
supported  by  the  results  in  section  3.3  of  Appendix  A:  for  Multibus  accesses,  the  access  time  distri¬ 
bution  for  reads  and  writes  dilTcr  insignificantly  and  for  Ringbus  accesses,  the  access  lime  distribu¬ 
tion  for  reads  and  writes  differ  significantly.  W/c  ignore  read-modify-wrile  accesses,  since  they  usu¬ 
ally  occur  infrequently  compared  to  reads  and  writes.  (The  effect  of  read-modify-writes  can  be 
included  by  incorporating  access  times  near  that  of  read-modify-writes  in  the  access  lime  distribu¬ 
tion  for  reads  and  writes.)  W'e  assume  that  byte  and  word  accesses  have  the  same  access  time  dis¬ 
tribution.  I  his  assumption  is  again  supported  by  the  rcsulLs  in  section  3.3  of  Appendix  A. 

Just  as  the  traffic  intensity  on  the  HSU  port  of  a  memory  board  affects  the  Multibus  access 
time  of  that  board,  the  traffic  intensity  on  the  Multibus  port  of  a  memory  hoard  affects  die  USD 
access  lime  of  that  board.  Since  the  processing  time  distribution  implicitly  includes  the  HSU  access 
lime  of  its  associated  local  memory,  the  processing  time  distribution  of  a  processor  depends  on  the 
traffic  intensity  on  the  Multibus  port  of  its  local  memory.  However,  since  the  processing  time  dis¬ 
tribution  is  an  exogenous  input  and  possibly  different  for  each  processor  (although  we  as  same  it  u> 
be  the  same  for  each  processor  in  Chapter  2  ami  3).  we  can  simply  accommodate  any  such  depen¬ 
dencies  by  using  an  appropriate  processing  time  distribution.  In  addition,  the  argument  which  wc 
presented  above  for  the  access  time  distribution  will  work  to  some  extent  for  the  processing  time 
distribution  (wc  can’t  be  sure  of  the  extent  since  wc  haven't  made  any  measurements  of  the  effect 
of  Multibus  port  traffic  on  the  processing  lime  distribution). 

Hie  processor  model  presented  so  far  in  illustrated  in  Figure  1.10. 


Figure  1.10:  Processor  model 
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The  one  remaining  embellishment  of  the  processor  model  concerns  long  word  accesses.  The 
access  of  a  32  bit  long  word  involves  two  consecutive  word  accesses  on  the  16  bit  wide  data  paths 
of  Concert.  However,  the  two  word  accesses  on  the  Multibus  arc  not  neccssaiily  consecutive  since 
a  processor  d'.>cs  not  maintain  mastership  of  die  Multibus  between  diem.  After  die  first  word 
access  of  a  long  word  completes,  a  processor  waits  some  amount  of  time,  which  we  call  the 
recovery  lime,  before  requesting  the  Multibus  for  die  second  word  access  of  the  long  word.  Other 
processors  may  sci/c  the  Multibus  in  this  lime  and  cause  die  second  word  access  to  wait  even  if 
the  first  word  access  did  not  wait.  Since  a  long  word  access  consists  of  word  accesses,  we  can  cer¬ 
tainly  incorporate  long  word  accesses  in  die  processor  model  as  presented  so  far.  However,  diis 
may  not  be  a  good  model  -  especially  if  a  processor  generates  a  lot  of  long  word  accesses  -  since 
the  processing  times  in  such  a  model  are  not  correlated  with  the  first  word  access  of  a  long  words 
when  in  reality  die  processing  times  arc  strongly  correlated  with  die  first  word  access  of  long 
words. 

We  add  a  fourth  phase  -  recovery  -  to  our  processor  model  to  create  an  alternate  model  for 
long  word  accesses.  In  this  model  we  assume  given  that  a  processor  gains  mastership  of  the  Mul¬ 
tibus  for  a  memory  access,  the  access  represents  the  first  word  of  a  long  word  access  with  probabil¬ 
ity  ft  and  a  rcgulai  byte  or  word  access  with  probability  I  ft.  Given  that  die  access  does 
represent  the  first  word  of  a  long  word  access,  the  processor  generates  a  request  for  the  second 
word  of  the  long  word  after  a  recovery  lime  denoted  by  ir.  This  second  word  access  has  die  same 
destination  -  Multibus  or  Kingbus  slice  i  -  as  the  first  word  access.  Again,  we  assume  that  ir  is  a 
random  variable  with  some  given  probability  distribution.  A  more  precise  definition  of  ir  in  terms 
of  Multibus  signals  is  given  in  section  2  of  Appendix  A.  I  bis  alternate  processor  model  is  illus¬ 
trated  in  Figure  1.11. 
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l-igurc  1.11:  Alternate  processor  model 
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1.3.2  Major  Assumptions 

ITic  major  assumptions  which  wc  make  throughout  this  thesis  are: 

1.  ITio  random  variables  tp  and  /„  for  each  processor  arc  stationary  (i.c.  their  probability  distri- 

O  M 

butions  are  independent  of  time).  Wc  also  assume  that  the  probabilities  /9,  and  /;,  for 
ciich  processor  arc  independent  of  time. 

2.  Conceit  is  an  crgodic  system  -  i.c.  long  term  time  averages  converge  to  the  values  computed 
for  stochastic  steady  state. 

3.  bach  processor  model  is  entirely  independent  of  all  other  processor  models  and  everything 
else.  More  precisely,  all  processing  and  access  time  random  variables.  lp  and  t„,  arc  stochasti¬ 
cally  independent  of  each  other  and  everything  else.  Also,  all  other  probabilities  /?,  <£,  and 
l>jK,<  are  stochastically  independent  of  each  other  and  everything  else. 

4.  The  overall  model  of  Concert  is  in  stochastic  steady  state. 

The  independence  assumptions  in  3  simplify  die  models.  Various  dependencies  of  the  ran¬ 
dom  variables  can  be  included  in  the  models  (as  discussed  in  section  2.10.4)  but  doing  so  increases 
tbc  number  of  states  and  complexity  of  the  models.  Furthermore  it  is  not  clear  at  the  present  time 
what  the  dependencies  are  and  how  significant  they  are.  Certainly  factors  such  as  the  programs 
run  on  the  system,  the  language  in  which  (he  programs  arc  expressed,  and  the  distribution  of  the 
programs  about  the  system  influence  the  number  and  magnitude  of  the  dependencies,  hut  how 
docs  one  intelligently  express  them  in  a  model?  Dealing  with  such  questions  and  the  various 
dependencies  is  beyond  tbc  scope  of  this  diesis.  Instead,  wc  adopt  a  conservative  approach:  we 
assume  that  there  arc  no  dependencies  and  determine  the  performance  as  predicted  by  these  sim¬ 
ple  models.  Future  research  can  be  devoted  to  developing  more  detailed  models  to  incorporate 
additional  factors.  The  performance  predicted  by  the  models  with  the  independence  assumptions 
can  be  used  to  bounds  the  performance  predicted  by  the  same  models  with  dependencies.  Thus 
the  independence  assumptions  allow  simple  models  that  yield  bounds  on  the  performance  of  more 
complex  models. 

Ways  to  relax  the  assumptions  in  1  and  3  are  discussed  in  section  2.10  in  relation  to  the  Mul¬ 
tibus  model. 

1.3.3  Factors  for  Study 

The  factors  wc  stud)  in  this  research  arc: 

1.  The  processing  time  distribution. 

2.  The  Multibus  access  time  distribution.  (The  Ringbus  access  time  distribution  is  an  equivalent 
model  of  the  entire  Ringbus  from  the  perspective  of  a  processor  on  a  Multibus  and  thus  it  is 
dictated  by  tbc  Ringbus.  However,  wc  do  consider  it  as  a  factor  for  study  in  conjunction  with 
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the  Multibus  model  in  section  2.9.) 

3.  The  probability  of  a  Ringbus  access.  <£,  and  the  Ring, Inis  destination  probabilities  i>fH.  We 
also  consider  the  probability  of  a  long  word  access  ft.  when  using  our  alternate  processor 
model. 

4.  The  number  of  processors  on  a  Multibus,  i.c.  in  a  slice. 

5.  Ihc  number  of  slices. 

6.  The  Ringbus  access  paths. 

7.  Ihc  Ringbus  arbitration  algorithm. 

1.3.4  Overall  Performance  Metric 

We  use  throughput  as  the  performance  metric  of  the  overall  model.  We  regard  the 
throughput  of  a  processor  as  the  number  of  Multibus  and  Ringbus  accesses  completed  per  unit 

time.  Thus  the  throughput  of  a  processor  is  equal  to  where  7CVC  is  the  cycle  time  given  by 

hyc 

t<yc  ~  Ip  +  I*  |  ^  ft  ^  0  ~f  P)((l  ”  )  l/imb  +  Inrb  )• 

denotes  the  mean  waiting  time  per  Multibus  request  for  a  byte,  word,  or  first  word  af  a  long 

word  access  and  tKi  denotes  the  mean  wailing  time  per  Multibus  request  for  the  second  word  of  a 

long  word  access.  7an,b  and  7wt,  denote  die  mean  access  time  for  Multibus  and  Ringbus  accesses 

respectively.  Ihc  total  throughput  is  thus  T.  - —  where  tcyc  is  die  mean  cycle  time  for  processor 

itl’lcyc, 

i  and  P  is  the  set  of  all  pmccssors. 

1.3.5  Decomposition  and  Integration 

We  divide  the  overall  Concert  system  into  a  number  of  subsystems:  one  for  each  Multibus 
and  one  for  the  Ringbus.  Kacli  Multibus  subsystem  consists  of  all  die  processors,  local  memories, 
and  global  memories  connected  to  the  Multibus.  The  Ringbus  subsystem  consists  of  the  Ringbus 
arbiter  and  everything  connected  to  the  Rills  except  for  the  Multibus.  This  definition  of  die  sub¬ 
systems  is  illustrated  in  figure  1.12. 


Introduction 


[ObB 

SHmi 


Multibus 


I..M.  Ci.M. 


Multibus  Kingbus 


Figure  1.12:  Subsystem  definitions 


Note  that  the  global  memory  module  connected  to  each  RIM  is  included  in  the  subsystem  for  the 
corresponding  Multibus  and  in  the  subsystem  for  the  Kingbus  -  we  view  it  as  being  shared  by  the 
two  subsystems.  lints  there  are  two  points  of  interaction  between  each  Multibus  subsystem  and 
the  Kingbus  subsystem:  die  Multibus  connection  to  the  KIM  and  the  global  memory  connected  to 
the  KIM.  However,  the  interaction  through  the  global  memory  connected  to  the  RIM  is  especially 
weak.  Measurements  reported  in  section  3.3  of  Appendix  A  reveal  that  the  access  time  distribu¬ 
tion  lor  accesses  via  one  port  of  the  global  memory  connected  to  die  Kill  is  hardly  affected  by 
heavy  loading  on  the  other  port  of  die  global  memo1}.  (Compare  f  igures  A.4  and  A.6  and  fig¬ 
ures  A.9  and  A. 10.)  We  ignore  the  interaction  between  Multibus  and  Kingbus  subsystems  through 
global  memory  in  die  rest  of  this  diesis.  Ihc  single  remaining  point  of  interaction  between  each 
Multibus  subsystem  and  die  Kingbus  subsystem  falls  on  a  natural  hierarchical  boundary  and  thus 
represents  a  natural  demarcation  point  between  the  subsystems. 

figure  1.13  gives  an  bstract  view  of  the  overall  system. 
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Figure  1.13:  Abstract  view  of  Concert 


We  can  regard  each  subsystem  as  a  black  box.  l-iadi  black  box  can  be  represented  by  an 
equivalent  lumped  model,  just  as  a  black  box  in  an  electrical  circuit  an  be  replaced  by  its  Thevenin 
equivalent  circuit.  The  Thevenin  equivalent  model  of  the  Multibus  subsystem  is  a  single  processor 
model  of  the  soil  described  in  section  1.2.1.  This  single  processor  represents  the  characteristics  of 
the  Ringbus  accesses  from  the  entire  Multibus  subsystem.  Let  the  interval  between  the  completion 
of  one  access  on  die  Multibus  with  a  Ringbus  destination  and  the  start  of  the  next  access  on  die 
Multibus  with  a  Ringbus  destination  be  called  die  Ringbus  spacing,  then  the  processing  time  dis¬ 
tribution  of  the  single  processor  equivalent  of  the  Multibus  is  equal  to  the  probability  distribution 
of  the  Ringbus  spacing.  We  make  no  distinction  between  word  and  long  word  accesses  for  the 
Ringbus  access  spacing:  thus  we  take  P  ~0  for  die  single  processor.  The  probability  of  choosing 
Ringbus  destination  i  in  the  single  processor  model,  which  we  denote  by  p,  ,  js  equal  to  die 
probability  that  a  Ringbus  access  in  die  Multibus  subsystem  is  for  destination  i.  Finally,  we  have 
ip  -  1  for  the  single  processor  equivalent.  The  access  lime  distribution  is  given  by  the  Ringbus 
model.  The  Thevenin  equivalent  model  of  die  Ringbus  subsystem  is  some  access  time  distribution 
for  each  Multibus-R I II  connection.  I  bis  access  me  distribution  for  a  connection  is  the  distribu¬ 
tion  of  the  lime  from  the  occurrence  of  a  Ringbus  request  to  completion  of  that  Ringbus  access 
for  all  Ringbus  requests  on  <hat  connection. 

We  decompose  the  overall  model  of  Concert  into  Multibus  and  Ringbus  models.  As  shown 
in  Figure  1.14,  Thevenin  equivalent  models  arc  used  to  represent  the  other  models  connected  to  a 
particular  model. 
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Figure  1.14:  Decomposition  into  models 

Given  some  Ringbus  access  distribution,  the  Multibus  model  can  be  analyzed.  Likewise, 
given  some  processing  time  distribution  and  Ringbus  destination  probabilities,  the  Ringbus  model 
can  be  analy/.cd.  However,  die  solutions  of  these  decomposed  models  do  not  necessarily 
correspond  to  die  solutions  of  the  subsystems  in  the  overall  system  since  the  models  arc  depen¬ 
dent.  The  Ringbus  access  time  distribution  is  given  by  the  Ringbus  model,  which  depends  on  die 
single  processor  model  of  the  Multibus.  The  single  processor  model  of  the  Multibus  is  given  by 
the  Multibus  model,  which  depends  on  the  Ringbus  access  time  distribution.  Integration  is  die 
process  of  solving  the  models  such  lhat  all  these  dependencies  arc  satisfied.  In  a  sense,  integration 
amounts  to  matching  die  boundary  conditions  -  i.c.  interactions  -  between  each  pair  of  models  to 
obtain  a  coherent  overall  model. 

We  perform  the  integration  iteratively.  First  we  assume  some  Multibus  single  processor 
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mode!  and  some  Kingbus  access  time  distribution.  Ilicn  we  solve  the  Multibus  and  Kingbus 
models  to  obtain  a  new  Multibus  single  processor  model  and  a  new  Kingbus  access  lime  distribu¬ 
tion.  We  analyze  the  models  again  to  obtain  updated  models  and  repeal  until  the  improvement  on 
successive  iterations  is  sufficiently  small.  We  do  not  discuss  the  the  existence  and  uniqueness 
issues  associated  with  integration.  It  should  be  clear  later  that  in  our  ease  integration  leads  to  a 
unique  solution. 

We  make  a  number  of  assumptions  and  approximations  to  simplify  integrating  the  models: 

1.  We  assume  that  the  Multibus  models  arc  identical  in  every  respect:  each  has  the  same 
number  of  processors  and  all  the  processors  arc  identical. 

2.  We  assume  that  the  Kingbus  model  is  symmetrical  with  respect  to  c;ich  Multibus. 

These  two  assumptions  mean  that  only  one  Multibus  model  (and  the  Kingbus  model)  needs 
to  be  involved  in  the  integration. 

3.  We  approximate  the  processing  lime  distribution  of  the  single  processor  model  of  the  Mul¬ 
tibus  by  an  exponential  distribution. 

4.  We  approximate  the  Kingbus  access  time  distribution  by  an  exponential  distribution. 

These  two  approximations  ease  die  analysis  of  the  models.  Since  an  exponential  distribution 
is  completely  specified  by  iis  first  moment,  these  two  approximations  also  considerably  c.isc  the 
integration  of  die  models,  since  the  integration  now  effectively  reduces  to  first  moment  matching 
(i.e.  we  just  have  to  determine  die  mean  processing  time  of  the  single  processor  model  of  a  Mul* 
Ubus  and  die  mean  Kingbus  access  time). 

Of  course,  dicso  assumptions  and  approximations  limit  die  applicability  and  accuracy  of  the 
integration.  The  accuracy  of  the  performance  predictions  obtained  via  integration  of  the  models  is 
assessed  by  comparison  with  simulations. 
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1.4  Previous  Work 

Single  bus  multiprocessors  like  the  Multibus  subsystem  have  been  studied  by  many.  1110 
basic  queueing  system  formulation  of  the  Multibus  model  in  Giaptcr  2  has  appeared  and  has  been 
studied  in  many  guises.  It  appeared  as  a  machine  repairman  model  as  early  as  1935  |K2J.  With 
the  advent  of  Klcinrock's  popular  volume  |K3],  the  M/M/1//N  model  of  the  basic  queueing  sys¬ 
tem  has  become  a  classic.  Jaiswals’  (Jl).  or  alternately  Benson  and  Cox’s  [152).  solution  of  the 
M/D/1//N  model  is  also  well  known.  '1110  theory  of  product  form  queueing  networks  which  we 
apply  is  well  known,  although  we  utilize  Kelly’s  powerful  and  elegant  quasi-reversibility  approach 
|K1J  to  queueing  networks  rather  than  the  more  well  known  local  balance  BCMR  approach  [151]. 

We  arc  not  aware  of  other  studies  dealing  with  our  particular  extensions  to  the  basic  queue¬ 
ing  system  model  of  the  Multibus.  However,  die  extensions  arc  simple  and  die  results  we  obtain 
follow  from  straightforward  application  of  product  form  queueing  network  theory,  so  others  may 
have  derived  similar  results.  Hie  specific  recursive  solution  technique  we  discuss  for  the 
PH/PH/1//N  model  is.  to  the  best  of  our  knowledge,  new.  aldiough  Herzog,  Woo,  and  Chandy 
[H2|  have  already  outlined  the  solution  of  general  queueing  systems  by  recursive  methods. 

lhc  Kingbus  subsystem,  on  the  other  hand,  is  a  novel  interconnection  scheme  which,  to  the 
best  of  our  knowledge,  was  not  studied  (or  conceived)  before  Anderson  [A2].  Anderson  focused 
on  die  design  of  a  workable  Ringbus:  he  only  performed  the  most  rudimentary  simulations  (see 
footnote  in  section  1.1).  We  study  die  optimum  performance  obtainable  with  a  Ringbus.  We  for¬ 
mulate  the  Ringbus  arbitration  problem  as  a  Markovian  decision  problem  and  treat  it  by  the  well 
known  techniques  of  Howard  [1 14)  and  Odoni  (02). 

The  decomposition/integration  approach  to  modeling  Concert  was  inspired  by  Courtois  [C5J. 
I  he  techniques  applied  in  this  approach  arc  standard. 
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1.5  Overview  of  Thesis 

We  study  the  Multibus  model  in  detail  in  Chapter  2  and  lay  live  foundation  in  section  2.0  for 
later  integration  with  the  kingbus  model.  In  Chapter  3  we  study  die  Kingbus  model.  We  concen¬ 
trate  mainly  on  the  optimum  performance  of  the  kingbus  and  die  arbitration  algorithm  which 
achieves  this  performance.  In  Chapter  4  we  integrate  the  Multibus  and  kingbus  models  and  make 
a  few  performance  predictions  to  demonstrate  the  integration  technique.  We  compare  diese  pred¬ 
ictions  to  simulation  results.  In  the  remainder  of  Chapter  4.  we  present  the  results  of  computer 
simulations  of  the  overall  Concert  model. 


Chapter  2 
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2.1  Introduction 

In  this  chapter  wc  study  the  Multibus  subsystem  in  detail.  We  use  the  pioccssor  model 
described  in  section  1.3  to  construct  various  increasingly  complex  models  of  die  Multibus.  Wc 
assume,  as  mentioned  in  section  1.3.2,  that  all  processor  models  arc  stationary  and  independent. 
To  case  analysis,  we  assume  in  addition  that  all  processor  models  arc  identical  in  every  respect. 
Hie  extension  to  non-identical  processors,  discussed  in  section  2.10.1,  is  straightforward  but 
increases  die  complexity  of  the  analysis  without  necessarily  contributing  much  insight. 

When  all  processors  are  identical,  die  mean  cycle  time  of  a  processor,  70r,  is  the  same  for 
every  processor.  (  This  follows  from  symmetry  arguments.)  Thus  die  throughput  of  the  Multibus  is 
A' 

given  by  — —  where  N  is  the  number  of  processors  and 

lcye 

tcyc~lp  +  0  ~*t/)^amb  '^'P^arb^- 

7W)  denotes  the  mean  waiting  time  per  Multibus  request  for  a  byte,  word,  or  first  word  of  a  long 
word  access  and  denotes  the  mean  waiting  time  per  Multibus  request  for  the  second  word  of  a 
long  word  access.  7amb  and  7ar/j  denote  the  mean  access  time  for  Multibus  and  Ringbus  accesses 
respectively. 

Since  and  7Wi  arc  die  only  parameters  which  determine  the  diroughput  of  the  Multibus 
which  arc  not  exogenous  inputs  to  the  Multibus  model,  the  performance  metric  for  the  Multibus 
effectively  reduces  io  the  pair  (7W[JW}).  In  this  chapter  we  Like  the  performance  metric  to  he  die 
mean  total  waiting  time  per  cycle  defined  by  7w.f~7w^  f-fi7H}.  This  gives  a  single  quantity  for  the 
performance,  as  with  throughput,  and  is  more  closely  related  to  the  Multibus  models  than 
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throughput. 

All  the  processors  on  a  Multibus  and  the  Multibus  arbitration  circuitry  arc  synchronized  by  a 
master  clock  with  a  lOOnscc  period  (one  master  clock  per  Multibus).  Thus  die  Multibus  subsystem 
inherently  operates  in  discrete  time.  We  model  this  discrete  time  operation  with  continuous  time 
models  to  take  advantage  of  the  simple,  powerful,  and  well  developed  modeling  methods  available 
in  continuous  time,  such  as  product  form  queueing  networks.  It  is  argued  in  die  following  para¬ 
graphs  that  there  is  not  much  loss  of  precision  in  this  approach. 

We  arc  not  interested  in  modeling  the  Multibus  at  the  level  of  the  Multibus  clock.  Such 
detail  is  unnecessary  for  our  purposes.  I'urthcrmorc,  any  model  based  on  die  suite  of  the  Multibus 
at  every  rising  edge  of  the  Multibus  clock  would  be  unwieldy  due  to  the  large  numlicr  of  such 
suites  required.  Rather,  we  are  interested  in  modeling  die  Multibus  at  the  event  level.  We  define 
an  event  to  be  a  request  for  a  Multibus  access  or  die  completion  of  a  Multibus  access.  (We  do  not 
consider  die  initiation  of  a  Multibus  access  to  be  an  event  since  either  it  is  equivalent  to  a  request 
for  a  Multibus  access  if  there  are  no  other  Multibus  accesses  pending  or  in  progress  or  it  is 
equivalent  to  die  completion  of  a  Multibus  access  if  a  Multibus  is  in  progress.  Similarly,  we  do  not 
consider  the  initiation  or  completion  of  processing  to  be  an  event  since  they  are  equivalent  respec¬ 
tively  to  the  completion  of  a  Multibus  access  and  a  request  for  a  Multibus  access),  because  the 
Multibus  actually  operates  in  discrete  time  synchronous  with  the  rising  edges  of  die  Multibus 
clock,  die  time  between  successive  events  is  the  some  integer  multiple  of  lOOnscc  and  one  or  two 
or  more  events  can  occur  simultaneously.  In  modeling  the  Multibus  in  continuous  time  at  the 
event  level,  we  make  die  following  two  approximations. 

1)  We  assume  dial  die  time  between  successive  events  can  take  on  continuous  values. 

2)  We  assume  dial  only  one  event  can  occur  at  a  time. 

The  first  approximation  introduces  a  maximum  error  of  ±50nscc  in  intcrcvcnt  times.  Since  in 
the  actual  Multibus  the  processing  time  is  at  least  (lOOnscc  and  the  access  time  is  at  least  lOOOnsec 
(see  Appendix  A),  die  loss  of  precision  introduced  by  the  first  approximation  is  small,  lor  the 
second  approximation,  we  note  that  die  probability  of  two  or  more  events  occurring  in  the  same 
Multibus  clock  period  is  small.  Thus  there  will  probably  only  be  a  very  small  loss  of  precision  due 
to  the  second  approximation.  Therefore  there  should  not  be  much  loss  of  precision  introduced  by 
electing  to  model  the  Multibus  in  continuous  time. 

The  Multibus  subsystem  can  be  modeled  as  a  queueing  system  with  a  finite  number  of  custo¬ 
mers.  Consider  the  case  in  which  0  and  ft  -0  -  i.c.  only  Multibus  accesses  and  no  explicit 
treatment  of  long  word  accesses  -  for  each  processor  model.  Denote  the  number  of  processors  by 
A  .  We  can  represent  the  operation  of  each  processor  by  a  customer  which  visits  service  centers 
(servers).  Once  a  customer  arrives  at  a  server,  it  remains  there  for  a  period  of  time  governed  by 
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the  service  time  probability  distribution  for  that  server.  I.et  there  be  N  servers,  called  processor 
servers  since  they  represent  the  N  processors,  each  with  an  identical  service  time  distribution  equal 
to  the  processing  time  distribution,  l  et  there  he  one  server,  called  a  Multibus  server  since  it 
represents  Multibus  accesses,  with  its  service  time  distribution  equal  to  the  Multibus  access  time 
distribution.  (Since  all  Multibus  accesses  have  the  same  access  time  distribution,  it  is  sufficient  to 
have  just  one  server  to  represent  a  Multibus  access.)  Finally,  let  there  be  no  more  than  one  custo¬ 
mer  in  service  at  a  server  at  any  instant  and  let  there  be  N  customers. 

Kach  of  the  N  customers  behaves  as  follows.  A  customer  visits  a  processor  server  and 
remains  there  for  some  processing  time  after  which  it  joins  a  queue  of  other  customers  waiting  to 
visit  the  Multibus  server.  When  the  customer  eventually  visits  the  Multibus  server,  it  remains  there 
for  some  access  time  and  then  it  returns  to  the  same  processor  server. 

Ibis  processor-queue- Multibus  cycle  of  a  customer  represents  the  proccssing-waiting- 
acccssing  cycle  of  the  processor  model  (with  -  0  and  ft  -0).  The  finite  customer  queueing  sys¬ 
tem  is  pictured  in  Figure  2.1  below.  'Ihc  circles  represent  servers. 


Processors 


Figure  2.1:  Finite  customer  queueing  system 

To  faithfully  model  the  operation  of  the  Multibus  arbitration  circuitry,  the  queueing  discip¬ 
line  at  the  Multibus  server  should  be  round-robin.  However,  to  case  analysis,  we  will  assume  that 
this  queueing  discipline  is  first-comc-first-servcd  (FCFS).  Interestingly,  there  is  no  loss  of  precision 
with  this  assumption.  Since  the  Multibus  server  is  work -conserving  (i.c.  the  server  is  always  busy 
while  there  remains  work  for  it  to  do)  and  since  all  customers  arc  identical  (i.c.  same  processing 
and  access  time  distribution  for  each  customer),  the  mean  waiting  time  per  access  on  the  Multibus, 
is  the  same  for  both  queueing  disciplines  [Ml],  Of  course,  the  waiting  time  distributions  will 

^  is  the  mean  waiting  time  per  acco.-s  for  any  access  on  the  Multibus  -  byte.  word,  first  word  of  long  word, 
and  second  word  of  long  word  If  ft  :70. /,v  — /W)  la  genera!  /*  ^  /*,  so 
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not  necessarily  be  the  same  (intuitively,  one  expects  the  variance  of  the  waiting  time  to  be  greater 
with  the  I'CI-'S  discipline  than  with  the  round-robin  discipline),  but  this  doesn’t  matter  since  our 
performance  metric  just  depends  on  the  mean  waiting  time,  7W. 

We  call  the  finite  customer  queueing  system,  depicted  in  Figure  2.1,  with  a  FCFS  queueing 
discipline,  the  basic  queueing  mode)  of  the  Multibus.  In  later  sections  we  extend  this  basic  queue¬ 
ing  model,  known  as  the  machine  repairmen  model  in  the  queueing  theory  literature,  to  accommo¬ 
date  and  j8*0.  A  convenient  notation  to  describe  the  basic  queueing  model  is  .S’i/.SV1//N. 
S  ]  and  .S’ 2  represent  symbols  denoting,  respectively,  the  processing  and  access  time  distributions. 
The  1  indicates  a  single  server  queue  and  N  indicates  the  total  number  of  customers.  Some  com¬ 
monly  used  symbols  arc  M  for  mcmoryless  (i.c.  exponential),  D  for  deterministic,  l'r  for  r  stage 
Krlangian,  and  G  for  general.  Thus  M/M/1//N  denotes  a  basic  queueing  model  with  exponential 
processing  and  access  times  and  N  processors. 

A  rather  exhaustive  analytical  treatment  of  the  basic  queueing  model  with  different  process¬ 
ing  and  access  time  distributions  is  presented  in  section  2.2  through  2.7.  Section  2.2  deals  with 
deterministic  processing  and  access  times.  Section  2.3  characterizes  the  general  behaviour  of  7W  for 
probabilistic  processing  and  access  times.  Sections  2.4,  2.5,  and  2.6  develop  results  for  die 
M/M/1//N,  M/G/1//N.  and  G/M/1//N  models  respectively.  Most  of  section  2.6  is  devoted  to 
describing  the  known  results  for  a  class  of  queueing  networks  with  convenient  product  form  solu¬ 
tions.  These  results  arc  heavily  utilized  in  sections  2.8  and  2.9.  Section  2.7  presents  a  recursive 
technique  for  handling  general  processing  and  access  time  probability  distributions.  This  is 
believed  to  be  the  first  demonstration  of  a  reasonable  solution  method  specifically  for  the 
G/G/1//N  model. 


Generalizations  of  the  basic  queueing  model  to  handle  fi* 0  and  arc  covered  in  sec¬ 
tions  2.8  and  2.9.  Section  2.8  treats  the  ease  with  /?*0  and  *p=0  and  section  2.9  treats  the  general 
ease  with  and  xp^O.  Section  2.9  discusses  the  decomposition  of  Concert  into  Multibus  and 
Ringbus  models  and  develops  the  hooks  for  the  later  integration  of  these  two  models.  Specifically, 
the  single  processor  equivalent  of  the  Multibus  is  developed  and  relations  yielding  its  parameters 
arc  derived. 


Lastly,  section  2.10  discusses  the  relaxation  of  the  four  major  assumptions  of  1)  identical  pro¬ 
cessors,  2)  simple  processor  model.  3)  stationary  processor  model,  and  4)  independent  processors. 
The  most  important  sections  in  Chapter  2  arc  2.2.  2.3,  2.4,  2.8,  and  2.9.  Section  2.6  is  also  impor¬ 
tant.  but  only  as  a  primer  on  product  form  solutions  of  queueing  networks  for  sections  2.8  and  2.9. 
Sections  2.5  and  2.7  arc.  in  some  sense,  icing  on  the  cake. 
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2.2  Deterministic  Model 

In  diis  lirst  model,  both  ip  ;ind  l0  are  deterministic  quantities. 

Initially  the  independent  processors  arc  uiisynchronizcd.  However,  due  to  the  determinism  of 
!p  and  (a ,  every  time  two  or  more  memory  requests  occur  at  die  same  time  that  die  bus  is 
currently  in  use,  die  processors  originating  those  requests  arc  synchronized  with  each  other  and 
with  the  processor  currently  using  die  bus.  Hie  synchronization  docs  not  occur  at  die  instant  of 
conflict  but  rather  at  die  instant  die  access  in  progress  terminates  and  the  request  at  the  head  of 
die  FCI  S  queue  wailing  for  die  bus  begins  its  access.  At  this  instant,  the  two  respective  proces¬ 
sors  arc  synchronized  so  that  die  cycle  of  the  one  just  beginning  its  access  lags  die  other  by  exactly 
i„.  Similarly  each  processor  which  has  a  request  in  the  queue  is  synchronized  so  as  to  lag  exactly 
iu  behind  die  processor  of  the  previous  access.  Since  lp  is  also  deterministic  and  the  same  for 
every  processor,  die  synchronized  processors  will  make  their  next  requests  at  intervals  of  i(l. 


Theorem  2.1 

With  independent  identical  processors  with  deterministic  processing  time  ip  and  deterministic 
access  time  i„  served  by  a  single  bus  in  I'CI'S  order,  the  waiting  time  per  request  after  at  most  two 
cycles  of  every  processor  is  die  same  for  every  request.  Moreover,  after  at  most  two  cycles  of 
every  processor  the  I’CI'S  queue  is  cidicr  always  empty  or  always  nonempty  at  die  instant  a 
request  arrives  at  die  queue. 

The  proof  of  this  Theorem  is  given  iu  appendix  It. 


By  construction  -0  when  N,  die  number  of  processors  oil  the  Multibus,  is  one.  I.ct  N  be 
defined  as  the  saturation  point:  in  the  steady  suite  for  N  <  N‘,  lw=  0  (corresponding  to  the 
queue  always  empty  when  a  request  arrives),  and  for  /V  >  N* ,  tw  >  0  (corresponding  to  the 
queue  always  nonempty  when  a  request  arrives).  This  saturation  point  is  the  maximum  number  of 
processors  for  given  ip  and  ta  that  the  bus  can  support  in  steady  slate  and  maintain  /(V  -0. 

The  maximum  number  of  processors  dial  the  bus  can  handle  with  zero  wait  time  for  a 
request  is  one  (for  the  bus  in  use)  plus  the  maximum  number  of  additional  processors  that  can  be 
processing,  but  not  waiting,  while  the  one  processor  is  currently  using  the  bus.  This  maximum 


number  of  processors  is  given  by 


Thus  iV  =  -  Ul. 


For  each  processor  added  above  N  .  all  processors  will  share  equally  (after  initial  transients 
^  |  *j  denotes  the  smallest  in’.eger  less  than  v 


,*»  V  \  N.  \  \  \  •/  ■,* 
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die  out)  die  wait  incurred  by  die  addition  of  each  processor  above  die  saturation  point,  if  all  pro¬ 
cessors  arc  identical  and  bus  arbitration  is  l-'CT'S.  To  find  iw  for  diis  ease,  we  may  equate  the 
arrival  rale  of  requests  to  the  bus  system  to  die  service  rate  of  requests  at  die  bus  system.  We 
have  then: 

_ N  _ _J_ 

(ip  +  lu  t »)  in 

from  which  we  obtain  tw-Nt(l  (lp  +  la). 

Hie  wait  per  request  normalized  by  the  access  time  is 


At  this  point  (and  in  die  sequel)  it  is  more  convenient  to  consider  N  '  and  N  as  continuous  radicr 

•  'n 

than  discrete  quantities.  Hie  saturation  point  is  thus  redefined  as  N  -  —  -t  1.  Although  die  dis- 

i(i 

cussiou  will  consider  N  and  /V*  as  continuous  quantities,  these  quantities  should  be  understood  to 
be  in  fact  discrete  whenever  they  arc  given  a  physical  interpretation. 

*  /»,  o  N  <  iV* 

Substituting  for  A'  ,  we  obtain  ----  *  which  completely  describes  die 

ta  N  -  A  ,  N  >  N 

behavior  of  /w  in  the  steady  state  for  the  deterministic  case  (sec  figure  2.2). 
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2..)  Probabilistic  Model  -  General  Behaviour 


We  now  consider  lp  and  l„  to  be  stationary  random  variables  with  given  probability  distribu¬ 
tions.  We  assume  that  the  random  variable  lr  for  each  processor  and  die  random  variables  arc 
independent  of  each  other  and  all  other  random  variables  as  discussed  in  section  2.1.  We  also 
make  the  reasonable  assumption  that  the  random  variables  lp  and  ia  have  finite  means  i.c. 
/'■('pK00  and  /'.'[/„]<<». 


In  addition  to  these  assumptions  and  those  in  section  1.3.2  we  make  llic  following  existence 
and  crgodicity  assumptions  in  dtis  section. 


Kxistencc  and  Krgodicity  Assumptions 


1.  We  assume  that  die  mean  waiting  time  per  reguest,  tw.  exists.  More  precisely,  we  assume  that  a 
stationary  probability  distribution  exists  for  (since  /*  is  defined  in  terms  of  its  probability 
distribution  function),  l  et  the  waiting  time  of  the  n'h  request  to  enter  the  queue  be  denoted 
by  /„  so  that  { tw  },  »>1,  is  a  sequence  of  the  waiting  times  of  successive  requests.  Ihc 


assumption  means  that  lim  l'r(iw  <y )  exists  and  equals  some  function  W(y)  where 

n  ~»oo  n 


l'r{ln  <y)  is  the  probability  distribution  of  die  waiting  time  of  the  n'h  request  and  W(y )  is 
the  stationary  probability  distribution  for  /*,. 


2.  We  assume  that  the  waiting  time  process  is  orgodic  so  dial  ensemble  averages  equal  (discrete) 


time  averages  i.c.  we  assume  that  /*  -  lim 

*-*oo 


3.  We  assume  that  the  time  averages  necessary  for  any  application  of  I  attic's  l  aw  to  the  queueing 
system  described  in  section  2.1  exist.  I. idles  laiw  is  the  following  statement: 


Consider  any  system  at  which  customers  arrive,  spend  time  in  the  system,  and 
depart.  I.et  N(i)  he  the  number  of  arrivals  at  the  system  in  the  interval  |0.r).  /(/)  be 
the  number  of  customers  in  the  system  at  time  /.  and  iv*  be  the  time  spent  in  the 


system  by  the  k"  customer  to  arrive.  If  die  Allowing  limits  exist  and  arc  finite 


A  ■  NO)  ■ 

\  -  lim  — — ,  average  arrival  rate 
/  — oo  / 


/,  -  lim  —  / l(s)Js.  average  number  in  system 
/-*»  t  0 


W  -  lim  —  Z.  w.,  average  lime  in  system 
*-*“>*/ Ti 


then  l.  AM'  (S3|. 


These  assumptions  arc  necessary  t< >  ensure  that  die  results  developed  in  this  section  are 
strictly  correct.  All  the  following  sections  in  this  chapter  deal  with  specific  probability  distributions 
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and/or  specific  situations  for  which  these  assumptions  arc  valid  in  all  eases;  thus  it  is  unnecessary 
to  state  them  in  the  sequel.  However,  this  section  deals  with  unspecified  general  distributions  for 
which  it  is  difficult  to  show  that  these  assumptions  are  valid  in  all  cases. 

The  purpose  of  the  first  assumption  is  straightforward  -  must  exist  before  we  can  talk 
about  it.  The  second  assumption  ensures  that  the  average  waiting  time  derived  from  an  application 
of  Little's  l  aw  equals  The  third  assumption  ensures  that  it  is  valid  to  apply  Little’s  Law.  Note 
that  if  the  lime  averages  in  this  third  assumption  exist,  then  they  must  be  finite  since  we  are  deal¬ 
ing  with  a  dosed  queueing  system.  If  one  is  willing  to  deal  with  a  lime  average  for  the  waiting 
time  pet  request  rather  than  an  ensemble  average  (i.c.  a  mean),  then  only  the  third  assumption  is 
necessary.  We  present  and  prove  some  conditions  in  Appendix  I)  for  which  the  three  assumptions 
are  valid. 

We  now  consider  the  general  behaviour  of  the  mean  waiting  time  per  request,  /„,  subject  to 
the  preceding  assumptions.  I  'or  a  single  processor  we  still  have  lw-  0.  We  can  derive  a  general  for¬ 
mula  for  /„  with  N  processors  using  Little’s  I  .aw. 

Let  ft  denote  the  mean  number  of  requests  queued  for  service  and  currently  in  service  on 
the  bus.  Let  i)p  denote  the  mean  number  of  processors  which  arc  processing  (i.e.  which  do  not 
have  an  outstanding  request).  Let  p  denote  the  probability  of  the  server  « i.c.  the  bus)  being  busy. 
Let  \‘-rr  denote  the  mean  arrival  rate  of  requests  to  the  bus.  Since  the  system  is  c'oscd  with  a  fin¬ 
ite  number  of  requests,  is  also  the  mean  service  rate  of  requests. 


Then  by  Little’s  Law  we  have;  ~7a.  Applying  Little’s  I  aw  twice  more  we  have 

ff  -  ~~tip  ip 

p  -- \  '  i„  and  Since  h  t  np  -  N  we  thus  have  — -----  - I  and  — . 


yielding 


hi 


.  'r 

where  we  now  define  A  -  —  t-  1 .  Ibis  same  result  can  be  obtained  by  considering  the 

ffl 


throughput  balance  equation  - 


Ip  ^  In  I  hi  hi 


It  follows  from  the  definition  given  above  that 


0<p<l.  and  thus  >A '  N  .  f  or  the  deterministic  ease  with  /V > N  .pi  and  thus  (he 


lower  bound  for  is  achieved  by  the  dctemiiiiistic  case.  Note  that  as  A/-*c© .  p — ►  1 .  thus  yicld- 
hi 


ing  the  same  asymptotic  behav  iour  as  derived  in  section  2.?.  Lor  X  <X  .  >ve  have  that  >0 

i.,  ~ 
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« ||'  0 

and  this  lower  bound  is  again  achieved  by  the  deterministic  ease.  Therefore  -  >mux( 0,  N  N  ) 

where  the  lower  bound  is  achieved  by  the  deterministic  ease.  We  summarize  this  result  as  a 
lemma: 

(iCiiuna  2.1 

Ihc  mean  wailing  time  per  request  in  the  previously  described  queueing  system  model  with 
stationary  processing  and  access  times  with  means  tp<<X>  and  la  <co  respectively  and  subject  to 
the  previous  assumptions  is  bounded  from  below  by  the  mean  wait  per  request  in  die  deterministic 
model  with  the  same  processing  and  access  times  lp  and  la  respectively. 

Proof: 

Given  in  die  above  development. 

We  can  also  say  that  w(N  + 1)  -  iv(W)>0  (where  we  use  the  notation  w(N)  to  indicate  the 
mean  waiting  time  71V  in  an  N  processor  system).  This  follows  since  adding  another  processor  can¬ 
not  cause  the  mean  waiting  time  to  decrease.  In  addition,  it  seems  intuitive  that 
h(/V  +  l)-w(N)<Ta\  an  arriving  request  in  the  N  + 1  processor  system  ought  to  sec  at  worst  one 
more  request  in  die  queue  than  it  would  in  the  corresponding  N  processor  system.  The  following 
theorem  justifies  this  intuitive  feeling. 

'Ilieorem  2.2 

Consider  the  queueing  model  described  previously  with  stationary  processing  and  access  time 
distributions  with  means  lp< oo  and  la< oo  respectively  and  subject  to  the  previous  assumptions. 
Then  w(N  +  1)-  iv(N)<7a  where  w(N)  denotes  the  mean  waiting  lime  in  a  /V  processor  model. 

Proof: 

Given  in  Appendix  I). 

I  hc  foregoing  allows  us  to  conclude  that  the  mean  wait  per  request  for  any  stationary  pro¬ 
cessing  and  access  time  distributions  has  a  curve  of  the  general  shape  indicated  below.  The  ran¬ 
domness  introduced  by  die  probability  distributions  rounds  the  "knee”  of  die  curve. 
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is  nq  - 


The  mean  queue  length  is  nq  -  ~  1)«a  *ir>d  the  average  arrival  rate  to  the  queue  is 

A  -2 

IV -I 

\cr/  \  By  l.iltlc's  l  aw.  the  mean  queueing  time  (mean  time  wait  in  queue 

A  -0 


rij 

before  being  served)  is  lw  -  I  be  normalized  mean  wait  per  request  is  thus 


*<•// 


A  N\(k~\)a 


-k 


Zj  - 

k  -2 

(A!-*)! 

/V  I 

V 

N\a~k 

(2.2) 


*To  <*-*-»>! 


where  a  - 


P  _  'p 


A  L 


Results  for  the  ease  a  - 1.0,  2.0,  5.0,  and  10.0  arc  displayed  in  Figure  2.5. 
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2.5  F vponenti.il  Distributed  Processing  ami  General  Service  -  M/(i/l//N  Model 

In  this  section  we  generalize  the  M/M/1//N  model  of  the  previous  section  to  include  any 
stationary  memory  access  (or  service)  time  distribution.  With  a  general  service  distribution,  the 
probability  distribution  of  the  remaining  service  time,  given  that  there  is  a  customer  in  service, 
depends  on  die  time  dial  the  customer  has  already  been  in  service.  In  such  a  case,  the  service 
lime  distribution  is  said  to  have  memory.  Since  a  state  must  include  all  history  or  must  summarize 
all  the  history  of  the  system  relevant  to  predicting  die  future  of  die  system,  the  slate  description  of 
whatever  system  the  server  is  in  must  include  the  expended  service  time  (or  alternately,  die  time 
remaining  in  the  service  of  the  customer),  whenever  a  customer  is  in  service. 

I'or  example,  one  state  description  of  the  M/G/1//N  system  is  to  let  die  suites  be  (k.l) 
where  k  requests  are  queued  for  service  or  in  service  and  the  request  presently  in  service  has  been 
in  service  for  lime  /,  \<k  </V,  />();  and  (0)  when  no  requests  arc  queued  for  service  or  in  ser¬ 
vice.  'I  hc  exponential  distribution  has  the  special  property  that  the  probability  distribution  of  the 
lime  remaining  is  independent  of  the  time  expired  so  far.  This  memorylcss  property  is  the  reason 
why  die  service  dine  completed  so  far  is  irrelevant  for  the  M/M/1//N  model  (which  is  why  the 
state  in  the  previous  section  was  simply  (A  ).  (0<A  </V).  and  is  die  reason  why  the  processing  time 
completed  so  far  at  each  processor  is  irrelevant  for  both  the  M/G/1//N  and  M/M/1//N  models. 

The  lacl  th.it  time  must  be  included  in  die  state  description  complicates  the  analysis  of  the 
M/G/I//N  model.  We  must  now  deal  with  an  incountably  infinite  number  of  stales  rather  than 
the  finite  number  of  the  M/M/1//N  model.  Three  analytical  methods  arc  common  for  finding 
the  steady  state  distribution  of  the  number  of  requests  queued  for  service  or  in  service,  from 
which  we  can  then  find  the  mean  waiting  time  per  request. 

1.  Stages 

In  this  method,  the  server  is  subdivided  into  a  number  of  stages  where  each  stage  lias  an 
exponential  service  distribution  and  only  a  single  customer  is  allowed  into  llie  system  of  stages  at  a 
time  (just  as  only  a  single  customci  is  in  the  original  server  at  a  time).  Considering  the  entry  and 
exit  points  of  the  server  to  be  special  stages  with  zero  service  time,  the  next  stage  a  customer 
enters  after  leaving  the  present  stage  is  governed  by  a  probability  distribution  which  may  depend 
on  the  present  stage.  The  mean  service  time  in  each  stage  may  also  depend  on  the  stage.  Cox 
[C41  has  shown  Unit  it  is  possible  to  synthesize  any  probability  density  which  lias  a  rational  Laplace 
transform  by  a  system  of  stages  as  just  described.^  Cox  has  also  shown  dial  the  system  of  stages  in 
Figure  2.6  is  canonic  in  that  it  captures  the  full  generality  of  densities  which  can  be  synthesized  by 

t  If  complex  values  arc  perm  tiled  for  the  ctponcnlin!  parameters  (Recall  lhai  an  exponential  disit  itninon  is  ful¬ 
ly  characterized  by  a  single  parameter  c,|u.il  to  llie  iceipmeal  of  the  mean  ) 
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die  method  of  stages.  In  particular,  feedback  and/or  feedforward  paths  add  no  further  generality. 
It  is  sometimes  convenient  to  consider  series-parallel  or  parallel-series  arrangements  of  the  stages, 
rather  than  the  ladder  arrangement  in  Figure  2.6. 


1'xponcntial  service  time  state 
with  parameter  ft, 


m 


A. 


m 


figure  2.6:  Canonic  ladder  arrangement  of  stages 


'llic  advantage  of  the  method  of  stages  is  that  the  slate  space  is  now  finite,  or  at  worst  count¬ 
ably  infinite.  This  arises  because  each  stage  in  the  server  is  exponential  and  thus  it  suffices  for  die 
shite  to  include  just  the  stage  in  which  the  customer  is,  radicr  than  the  time  completed  so  far  in 
die  service. 

The  resulting  suite  transition  diagram  will  be  similar  to  that  in  Figure  2.6  in  the  previous  sec¬ 
tion  except  dial  the  the  states  arc  more  conveniently  arranged  in  a  two  dimensional  manner  and 
transitions  are  not  limited  to  nearest  neighbours.  Inc  equilibrium  equations  relating  the  steady 
state  probabilities  are  easily  obtained.  Since  diese  are  linear  equations  it  is  in  principle  straightfor¬ 
ward  to  find  the  steady  suite  probabilities.  Note  diese  arc  die  steady  state  unconditional  probabil¬ 
ities;  they  must  be  summed  over  the  appropriate  suites  to  obtain  the  steady  state  marginal  proba¬ 
bilities  such  as  die  number  of  requests  queued  for  service  or  in  service. 

The  mcdiod  of  stages  has  three  disadvantages.  First,  closed  form  results  are  difficult  to 
obtain  except  in  special  eases  due  to  die  complexity  of  solving  a  large  number  of  simuluincous 
linear  equations.  Thus  it  is  difficult  to  dctcnninc  how  die  result  varies  as  a  function  of  the  input 
parameters  such  as  mean  arrival  and  mean  service  times  without  recomputing  die  result  for  each 
set  of  parameters. 

Second,  the  exponential  parameter  and  next  stage  probability  distribution  must  be  found  for 
each  suigc,  preferably  so  as  to  minimize  the  number  of  stages  required  to  represent  a  given  proba¬ 
bility  distribution.  This  can  be  accomplished  by  matching  either  the  poles  and  zeroes  of  the 
Laplace  transform  of  the  suigc  system  with  the  poles  and  zeroes  of  the  Laplace  transform  of  the 
given  probability  density  or  by  matching  polynomial  coefficients  of  the  two  Laplace  transforms 
(both  amount  to  the  same  diing).  In  either  ease,  the  matching  involves  solving  a  set  of  nonlinear 
equations  relating  the  stage  parameters.  The  number  of  stages  required  is  equal  to  the  number  of 
poles  in  the  Laplace  transform  of  the  given  probability  density,  assuming  all  pole-zero 
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cancellations  hove  been  removed.  As  might  he  imagined,  certain  interconnections  of  the  stages 
make  the  solution  of  die  simultaneous  equations  easier  than  others.  While  straightforward  in  prin¬ 
ciple.  finding  the  stage  parameters  requires  a  substantial  amount  of  work  in  the  general  case. 

Third,  only  probability  densities  with  rational  Laplace  transforms  can  be  handled  exactly  in  a 
finite  number  of  stages.  However,  since  any  nonrational  function  can  be  approximated  arbitrarily 
closely  by  rational  functions,  we  can  in  principle  use  the  method  of  stages  for  any  arbitrary  (sta¬ 
tionary)  probability  density.  The  problem  in  practice  is  how  to  best  approximate  a  given  distribu¬ 
tion  by  one  dial  has  rational  transform. 

2.  Imbedded  Markov  Chain 

In  this  method,  the  two  dimensional  state  description  (k,l)  of  die  system  is  reduced  to  a  one 
dimensional  suite  description  (A)  by  looking  at  the  system  only  at  select  points  in  time.  These 
points  must  be  such  that  given  the  number  in  die  system,*  (A),  at  one  such  point,  and  the  inputs 
to  the  system,  then  at  the  next  point  in  time  we  can  calculate  the  number  in  die  system.  Thus 
these  points  must  implicitly  include  the  time  dial  has  been  expended  on  the  customer  in  service. 

One  set  of  such  points  is  the  service  departure  times  -  i.c.  the  time  at  which  a  customer  com¬ 
pletes  service.  At  a  departure  insLint.  the  expended  service  of  the  next  customer  is  zero  (and  die 
residual  service  of  the  present  customer  is  zero)  and  die  tine  to  die  next  departure  is  given  by  the 
unconditional  service  time  distribution  as  long  as  at  least  one  customer  is  left  h  the  system.  If  the 
system  is  empty,  the  lime  to  the  next  departure  instant  is  given  by  the  convolution  of  the  arrival 
time  distribution  (which  is  exponential  with  parameter  N A  for  die  M/G/I//N  case)  with  the 
unconditional  service  time  distribution.^ 

The  behavior  of  the  system  at  die  imbedded  points  -  die  departure  instants  -  can  be 
described  by  a  Markov  chain.  Let  the  suite  of  the  Markov  chain  be  the  number  of  customers  in 
die  system  immediately  after  a  departure.  Hie  transition  probabilities  can  be  determined  from  the 
arrival  and  service  time  distributions.  Hie  steady  state  solution  of  the  Markov  chain  gives  the 
steady  slate  probability  of  finding  A  customers  in  the  original  system  at  the  departure  instants,  but 
not  the  correct  steady  suite  probability  at  arbitrary  times  between  departures.  (It  actually  docs 
give  the  correct  results  at  all  times  if  die  customer  population  N  is  infinite  and  die  arrival  time  dis¬ 
tribution  is  exponential.)  However,  the  mean  waiting  time,  as  we  are  concerned  with  in  this 
chapter,  is  stilllcici:*  to  determine  the  probability  that  the  server  is  idle  and  this  is  easy  to 

* 

tty  sysicm  wc  mean  in  tins  case  the  I  Cl  S  queue  and  us  server. 

^  The  probability  distribution  of  the  sum  of  two  independent  random  variables  is  Ihe  convolution  of  the  two 
rcspcciivc  probabiliiy  densities. 
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determine.  Ilius  the  steady  stale  solution  of  die  one  dimensional  imbedded  Markov  chain  at 
departure  instants  is  sufficient  to  find  the  mean  waiting  time. 

Other  sets  of  points  exist  which  may  be  used  to  derive  an  imbedded  Markov  chain  but  they 
arc  not  as  convenient  since  the  expended  service  of  the  next  customer  will  not  be  zero  (otherwise 
we  have  the  same  set  of  points  as  before).  'I his  necessitates  handling  the  messy  ease  when  a  cus¬ 
tomer  docs  not  remain  in  service  long  enough  to  reach  the  imbedded  point. 

The  advantage  of  the  imbedded  Markov  chain  method  is  that  general  service  time  distribu¬ 
tions  may  be  handled  explicitly  and  without  solving  for  a  myriad  of  parameters  as  in  the  stage 
method.  The  disadvantage  is  again  that  it  is  difficult  to  obtain  closed  form  results.  This  is  princi¬ 
pally  due  to  all  the  bookkeeping  required  to  keep  track  of  the  number  of  "active"  arrival  genera¬ 
tors  in  the  finite  population  case.  Such  bookkeeping  is  unnecessary  in  die  infinite  population  case 
and  explicit  results  for  the  mean  waiting  time  (depending  only  on  die  mean  arrival  rate  and  die 
mean  and  variance  of  the  service  lime!)  and  the  waiting  time  distribution  can  be  obtained. 


3.  Supplementary  Variables 

In  this  method  the  problem  posed  by  the  two  dimensional  discrete-continuous  state  space 
( k.i )  (for  k* 0)  is  attacked  directly  by  solving  die  related  differential  difference  equations.  Closed 
form  results  for  arbitrary  scivicc  time  densities  can  be  obtained  by  this  method.  We  give  the  main 
results  below,  from  the  derivation  ol  laiswal  (Jl).  Ixt 

p  be  die  server  utilization  i.c.  the  probability  that  the  server  is  busy 

h  be  the  mean  busy  period  of  the  server  (the  mean  time  interval  between 
die  server  being  idle) 

--  be  die  mean  of  die  exponential  processing  time 

A 

l0  be  die  service  time  (i.c.  access  time)  with  density  /(/„)  and  mean  Ja 
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Hy  applying  l  ittle's  law  twice  we  can  determine  the  mean  wailing  time  (i.c.  queueing  lime) 
per  request,  7W.  From  l.itllc's  I aw  we  have  p-\cff(a  where  A f//-~A  X  average  number  proces¬ 
sors  running  -  A (A  -I.)  and  /,  denotes  die  mean  number  of  request  in  the  Feb'S  queue  or  in 

service.  'Iltus  l.=N--~--N-ap  where  a-- Again  from  l.ittle's  Law  we  have 


tw  -  — - - Ihcrcforc 

x<// 


—  a  —  I 


Substituting  for  p,  we  obtain 


~  -N  -  <x~\ 


1  ,,  a 

•— —  -  A  -  A  +  — 

b A  l>\ 


where  (the  normalized  mean  busy  period  i.c.  the  average  number  of  consecutive  requests 

served  without  an  intervening  idle  period). 

1 W  *  0[ 

liquation  which  should  be  familiar  as  just  —  in  the  deterministic  ease  for  A>  A  plus  — 

Equation  2.4  might  lead  one  to  conjecture  that  the  maximum  difference  in  mean  waiting 
time  per  request  between  the  Mz'G/l//N  and  deterministic  model  (section  2.2)  occurs  at  the  knee 
A  -  A  .  The  following  lemma  shows  that  this  conjecture  is  indeed  correct,  in  even  a  more  general 
setting,  provided  A *  is  an  integer.  T  he  treatment  must  be  more  careful  for  non-integer  A ’  since 
the  queueing  system  model  allows  only  integer  A .  The  general  idea,  however,  still  holds  when  A'* 
is  non-integer.  (The  graphs  have  been  drawn  as  continuous  in  A  to  emphasize  lire  trends.) 


Lemma: 

Let  w(A)  be  the  mean  waiting  time  per  request  in  a  G/G/1//N  queueing  system  with  arbi¬ 
trary  processing  and  access  time  distributions  with  means  7p  and  t(,  respectively.  Let  w/)(N)  be 
the  mean  waiting  time  per  request  in  a  D/D/1//N  queueing  system  with  constant  processing  and 
access  times  ip  and  7a  respectively.  Then  the  difference  w(A)-  w/)(A)  is  maximum  at  either 


A  A  or  A 


where  A  -a+l,a  =  ~-. 

Aj 
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Proof: 

Consider  I <N<N*: 

l-'rom  section  2.2  w/)(N )  0  in  this  range.  In  addition  u  (V  /  1)  n(.V)>0  for 
every  N>\.  i.c.  w{N)  is  nondccrcasing  in  N.  Thus  w(N)  wj)(N)  is  maximum 
for  \<N<N*  when  N  is  the  largest  integer  less  than  or  equal  to  A*  -  i.c. 

A  =  |a'|. 

Consider  A*<A: 

Prom  section  2.2  wd(N)-N  N‘  and  w/}(N  f  1)-- w/)(N)~  ta  in  litis  range.  In 
addition  w(N  /  I)-  w(N)<ilt  by  Ihcorem  2.2.  l  et  N°  be  the  smallest  integer 

greater  titan  or  equal  to  A*  -  i.c.  A"  |a*|  -  and  let  tv(/V°)  w/)(N°)-- S. 

Ihcn  w(N°  i  1)-wd(N°  i  l)<w(N°)~  wd(N°)  Ily  induction  on 

u  =0,1,2.  •••  we  have  w(N°  -/-//)— w/}(N°  t  n)<8  for  all  «>0.  Ihus 

tv(yV)  wp(A)  is  maximum  for  A*<A  when  A=|a*|. 

ITicrcforc  w(N)~  wp(N)  is  maximum  at  either  /V  —  J  /V  *  I  or  A/  —  J  TV  *  | . 

Remark: 

If  A*  is  noninteger  these  two  points  arc  distinct  and  the  one  at  which  the  maximum  occurs 
depends  on  A*  •-  |a*|  and  w(N). 

2.5.1  Kxponcnlial  Distributed  Processing  and  Deterministic  Service  •  M/D/I//N  Model 

We  now  consider  as  a  special  ease  of  die  foregoing  a  model  with  deterministic  (constant) 
memory  access  times.  This  special  ease  is  interesting  for  two  reasons.  The  first  reason  is  that 
memory  accesses  on  the  isolated  Multibus  directed  to  the  global  memory  have  a  relatively  constant 
duration.  Ihcrc  is  still  randomness  associated  with  die  access  time  due  to  such  factors  as  rcad- 
modify-writc  accesses  (which  have  a  significantly  longer  access  time  than  normal  read  and  write 
accesses)  and  variations  in  the  propagation  delays  of  die  logic  circuitry  and  signal  paths.  If  we 
consider  rcad-modify-writc  accesses  to  be  so  infrequent  that  they  can  be  ignored,  we  can  get  some 
idea  of  the  Multibus  access  time  distribution  by  referring  to  section  3  of  Appendix  A.  Roughly 
90%  or  more  of  die  Multibus  accesses  to  the  global  memory  take  1.00  or  1.10  /iscc.  Thus  a  con¬ 
stant  access  time  seems  like  a  reasonable  approximation  in  this  case.  However,  memory  accesses 
on  die  Multibus  directed  to  local  memory  modules  can  vary  over  a  much  wider  range  (as  indicated 
in  Figure  A. 5  in  Appendix  A)  due  to  the  IlSil  traffic  on  the  other  port  of  the  accessed  memory. 
Thus  a  constant  access  time  docs  not  seem  like  a  reasonable  approximation  in  this  ease. 
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The  second  reason  for  considering  the  deterministic  case  is  that  die  mean  wail  per  request  in 
the  deterministic  case  provides  a  lower  bound  on  the  mean  wait  per  request  for  all  M/G/I//N 
models  with  the  same  mean  processing  and  access  times.  Thus  although  the  exact  access  lime  dis¬ 
tribution  may  not  be  known  (or  may  be  too  variable  to  be  considered  constant},  we  can  still  bound 
the  behavior  of  the  mean  waiting  time. 

Theorem  2.3 

Given  that  the  mean  processing  and  access  times  arc  the  same  in  both  the  M/G/1//N  sys¬ 
tem  and  the  M/D/1//N  system,  die  mean  waiting  time  (queueing  time)  in  die  M/G/1//N  system 
is  bounded  from  below  by  die  mean  waiting  time  in  the  M/D/1//N  system. 

Proof: 

hollowing  Price  |P3|,  and  referring  to  the  M/G/1//N  results  presented  earlier,  we 
have: 

Tw  is  strictly  increasing  in  /. , 

/,  is  strictly  decreasing  in  p, 

p  is  strictly  increasing  in  b, 

h  is  strictly  decreasing  in  <p( i ),  and 

<p(i  )  is  strictly  increasing  in  die  function  l'“(s). 

Ihus  Jw  is  minimized  when  /•’*(.*)  is  minimized.  Now  from  jensen's  Inequality  [1*1 
p.434)  /•*($)---  /:[c  ^"1  >  e  e  s'a t  which  is  die  transform  of  a  deterministic 

function.  Therefore  a  constant  service  time  of  duration  la  gives  a  lower  bound  on  the 
mean  waiting  time  among  all  distributions  with  die  same  mean  la. 

All  three  methods  mentioned  earlier  for  die  M/G/1//N  model  have  been  applied  to  the 
solution  of  the  M/D/1//N  model.  Benson  and  Cox  [B2J  used  the  method  of  stages.  They 
obtained  a  closed  form  solution  for  a  service  distribution  consisting  of  a  cascade  of  r  exponential 
stages  (called  an  r  stage  Krlangian  distribution  and  denoted  by  l'r)  and  then  took  the  limit  as 
r-* oo.  Raskin  [Rl|  employed  an  imbedded  Markov  chain.  Jaiswal  obtained  the  closed  form  solu¬ 
tions  presented  earlier  using  die  technique  of  supplementary  variables.  In  addition.  Ashcroft  |A3) 
has  derived  a  solution  for  die  M/G/1//N  model  spirting  with  an  expression  for  the  mean  busy 
period. 


M 
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The  actual  results  lor  the  mean  waiting  time  per  request  in  the  M/D/1//N  model  arc  plot¬ 
ted  in  Figure  2.7  lor  tltc  same  eases  as  in  the  M/M/1//N  model.  (  The  data  lor  this  Figure  is 
taken  from  AshetoIVs  paper.)  For  purposes  of  comparison,  die  earlier  M/M/1//N  results  arc  also 
plotted.  Note  that  the  M/M/1//N  and  M/D/1//N  results  are  very  similar  except  around  the 
"knee"  of  the  curves. 

We  also  observe  the  following: 

For  a  given  a,  the  difference  in  mean  waiting  time  for  the  M/M/1 //N  and 
M/D/1//N  models  first  increases  with  N.  and  then  decreases  with  N.  Similarly,  for  a 
given  N.  die  difference  first  increases  with  a  and  then  decreases  with  a.  The  max¬ 
imum  difference  in  tltc  mean  waiting  times  occurs  close  to  the  "knee"  at  N  =N*  and 

increases  with  N*  (in  fact  the  maximum  difiercnce  occurred  at  cither  N  J /V * |  or 
N  -  |  N*  I  /  1  in  the  eases  in  which  numerical  results  were  computed). 

The  validity  of  these  observations  in  die  general  ease  may  be  ascertained  by  examining  the  difTer- 
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l  or  N  -- 1.  b/u  -  6m  -- 1  and  hence  r.„  -  1.  As  N-*co,  bN  and 

"U/U/V/N  ™M/t)A//N  nU/M/l//N 


bN 


V/P/lr/N 


►oo  (iit  different,  rates)  and  /V  -N  -»00;  hence  rw~*  1.  I -or  N  -  N  , 


b\i 

‘  M'DA/sS 


Av,, 


M/MA//X 

which  is  clearly  greater  than  1  for  AOT.  Ihus  Aiv(JV)  must  increase  and  then  later  decrease  with 
N. 


l-’or  small  values  of  a.  and  h*  arc  large  and  hence  r  » 1 .  for  large  values 

'  M  /M  A//N  "M/DA//N  no 


of  a,  &/V.,  „„  ~b\/  ,  and  again  r^^l.  For  all  values  of  a.  b\i  >bu  .  In  par- 

"U/AI  'V/N  nMAU\//X  °  w  'VA(//)/l//V  —  ,VA//A//I//Ar  1 


ticular.  >6*/  and  thus  r„  >  1  for  medium  values  of  a.  Since  r,.,  is  continuous  in 

v  M/l)/\//N  "U/MA//N  *  11 


a,  this  is  enough  to  conclude  that  AiV(A)  increases  with  a  and  later  decreases  with  a  (although 
not  necessarily  monolonically). 


2.5.2  Comments 


It  is  difficult  to  say  much  more  of  interest  about  the  M/G/I//N  model  without  some 
knowledge  of  the  access  time  distribution;  indeed,  the  mean  waiting  time  per  request  is  completely 
specified  by  the  closed  form  expression  given  earlier  once  the  distribution  is  known. 

f  rom  section  .1.3  of  Appendix  A  we  see  that  all  access  times  must  he  in  the  range  1.02  psec 
to  1.82  pscc  (allowing  for  best  and  worst  ease  propagation  delays  and  traffic  on  the  other  memory 
port  and  assuming  no  read-modify-writes).  One  might  conjecture  that  because  this  access  time  dis¬ 
tribution  is  more  "deterministic”  than  an  exponential  one  with  the  same  mean  (and  certainly  does 
not  have  the  long  tails  of  the  exponential),  the  mean  waiting  time  ought  to  be  bounded  from 
above  by  that  for  an  exponential  distribution  with  the  same  mean.  This  is  indeed  the  ease  as  the 
following  argument  shows. 


Recall  from  equation  2.4  that  the  mean  waiting  time  per  request  is  given  by 

—  -  N  -N  +  -  — , 

bl  V 


where  Sivo/s'l'VIfl 

,^ri 

/  -  1  m  -  1 

/  (v) 

As  discussed  in  the  proof  of  Theorem  2.3,  is  strictly  increasing  in  l'*{s).  I  hus  to  show 

^(7 

'nM/M/xz/N^'n  d  suffices  to  show  that  I'  (i\),\r/u/]//i\>  I ' * { /  A )  for  all  /  and  A>0. 
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Theorem  2.4 

Let  /•'  u<,(.v)  denote  the  Laplace  transform  of  the  probability  density  function  f„i(x)  with 
mean  x  where  J'ai,(  x)  0  for  x  (f  [«/.AJ:  0<a  and  b  <2a.  Let  \//n  denote  the 

I xtplacc  transform  of  the  exponential  density  function  with  die  same  mean  x.  Ihcn 
l  ’(s )m/ma//n > l'*ob (s )  for  5  real  and  s>0. 

Ihc  proof  of  this  theorem  is  given  in  Appendix  II.  Lor  die  ease  at  hand  a  -1.02  and 
b-  1.82  <  ?</,  dius  l'*(i\)M/M/\//N>l"t at>(i\))  for  every  »  and  \  >0. 

ITieorems  2.3  and  2.4  imply  that  the  mean  waiting  time  for  the  M/GA//N  model  as 
presented  in  section  2.5  is  bounded  above  and  below  by  die  M /M A//N  and  M/DA//N 
mo. Ids  respectively,  with  the  same  mean  processing  and  access  times.  Therefore  a  quick  charac¬ 
terization  of  die  mean  waiting  time  of  the  M/GA//N  model  with  any  access  lime  distribution 
(obeying  the  restrictions  in  Theorem  2.4)  can  be  obtained  from  die  M/M A//N  and  M/DA//N 
models.  Furthermore,  by  analogy  with  the  Pollac/ck-Khinchin  formula  for  the  mean  waiting  time 

in  die  M/GA  queue  erne  would  expect  the  mean  waiting  time  to  vary  approximately  linearly 

2 

with  the  square  of  die  coefficient  of  variation  of  the  access  dine  distribution  given  by  Cx  =  — y . 

x 

However,  as  Price  [I’M  Points  out  by  means  of  example,  tin's  can  be  misleading  since  die  variance 
can  be  dominated  by  a  few  long  access  times  which  have  little  effect  on  die  mean  waiting  lime. 

A  reasonable  model  for  the  access  time  distribution  is  an  r  stage  Lrlangian  distribution.  Fig¬ 
ure  2.8  shows  how  the  Hrlangian  density  funetion  varies  with  r. 


*  Hie  M/G'A  queue  is  an  open  queueing  model  (as  opposed  lo  die  closed  models  considered  in  this  chapter) 
wilh  a  Poisson  arrival  process  and  a  general  service  process  independent  of  (he  arrival  process.  The  mean  wailing 

_  px(lhCx2) 

lime  in  the  queue  is  lw  — - where  arrivals  occur  at  rale  A.  service  has  mean  ,v  and  variance  at. 

2  2(1  -  p) 

,2  ^  r 

and  p  =  X  x  .  Cx  =  — 
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2.6  General  Processing  and  Kxponcntiul  Access  Time  Distributions-*  G/M/I//N  Mo<lcl 

Wc  now  consider  the  effect  of  die  processing  time  distribution  on  the  mean  waiting  time  per 
request.  For  this  section  wc  keep  the  service  time  distribution  exponential  to  facilitate  comparison 
with  the  earlier  M/M/1//N  model  and  to  determine  the  relative  effect  of  changes  in  processing 
and  service  time  distributions  with  respect  to  the  M/M/1//N  model. 

The  G/M/1//N  model  could  be  solved  using  any  of  the  three  methods  described  in  section 
3.  However  they  all  become  cumbersome  because  whatever  method  is  chosen  must  essentially  be 
applied  N  times  since  there  arc  N  general  distributions.  T  he  state  description  must,  explicitly  or 
implicitly,  contain  the  processing  time  completed  so  far  at  each  processor  that  is  busy  and  the 
number  of  requests  waiting  for  or  in  service.  Thus  there  arc  anywhere  from  0  to  /V  continuous 
variables  in  die  stale  description,  i  bis  leaves  the  imbedded  Markov  chain  and  supplementary 
variable  methods  hopelessly  complicated  for  reasonable  values  of  ,<V.  Direct  application  of  the 
method  of  stages  is  also  very  complicated.  However,  in  the  special  case  of  the  G/M/1//N  model 
-  due  to  the  exponential  access  time  distribution  -  the  solution  of  the  equilibrium  equations  has  a 
very  simple  form. 

2.6.1  Product  Form  Solutions 

In  certain  cases  the  steady-state  probabilities  for  a  system  of  two  or  more  interconnected 
queues  have  the  following  form: 

I  ct  the  vector  xr,  denote  the  state  of  queue  /,  and  let  nx  denote  the  steady  state  probability 
of  1l1.1t  state  when  queue  i  is  in  isolation.  Then  die  overall,  or  global,  state  of  the  system  is  given 
by  X  (xi.x; . x„).  IX'iiotc  die  steady-state  probability  of  global  state  X  by  tt,^.  Then 


ir\  <  ft  tt  ,  ,  where  C  is  a 


normalizing  constant. 


Any  system  in  which  the  steady-stale  probabilities  can  be  expressed  in  such  a  form  is  said  to 
have  a  product  form  solution.  Product  form  solutions  are  extremely  convenient  in  that  011c  can 
dispense  wuh  solving  the  global  equilibrium  equations;  it  is  sufficient  to  solve  for  the  steady-state 
probabilities  for  each  queue  in  isolation.  In  the  following  we  summarize  the  main  results  known 
pertaining  to  product  form  solutions  111  queueing  networks  as  described  by  Kelly  [KIJ. 

I  he  principal  result  is  the  following: 

Suppose  there  are  u  queues  (the  queue  is  thought  of  as  a  black  box  here  and  includes  the 
server  I’oi  dial  queue)  and  a  total  of  k  classes  of  customers  in  the  overall  system.  For  each  queue 
1  assume  that  no  more  than  one  customer  enters  or  leaves  the  queue  at  any  point  m  time,  l  et 
each  uisiomci  m  queue  /  belong  to  some  il.e.  k  in  lire  total  set  oft  I  oscs  A  (1 )  vi-.uinc  that  queue 
and  assume  dial  customers  cannot  change  class  as  dw.  pass  through  the  queue.  I  et  the  stale  of 
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queue  /  at  lime  t  be  denoted  by  *,-(/)  and  assume  that  the  state  information  allows  die  number  of 
customers  of  each  class  at  the  queue  to  be  determined.  If  each  queue  i  is  quasi-reversible  in  isola¬ 
tion.  then  die  equilibrium  probability  distribution  has  the  product  form  given  above. 

A  queue  i  is  quasi-reversible  if: 

1)  its  state  Xj(t)  is  a  stationary  Markov  process, 

2)  die  arrival  times  of  class  k  customers.  k€K(i)  after  lime  /  arc  independent  of  x ,(/)  before  or 
at  l, 

3)  die  departure  times  of  class  k  customers,  k€K(i)  after  time  t  arc  independent  of  x^i)  at  or 
after  t,  and 

4)  the  mean  rate  of  class  k  arrivals  and  departures  is  equal  for  every  k  €K(i). 

If  a  queue  is  quasi-reversible.  then  points  2  and  3  imply  dial  the  arrival  and  departure 
processes  of  class  k  customers  are  independent  Poisson  processes. 

Two  types  of  queues  are  known  to  be  quasi-reversible.  In  both  types,  die  arrival  process  of 

class  k  customers  is  Poisson  with  rate  \(k ),  giving  a  total  arrival  rate  of  A-~  ^.A( k ).  The  first 

* 

type  is  distinguished  by  exponentially  distributed  service  times  with  die  same  mean  service  for  a(l 
classes  of  customers  (although  the  mean  may  vary  widi  the  number  of  customers  in  die  queue). 
Kelly  [Kl]  describes  this  type  of  queue  as  follows: 

Assume  wc  are  dealing  with  queue  i  and  let  n,  be  the  total  number  of  customers  in  die 
queue. 

(i)  Kach  customer  requires  an  amount  of  service  which  is  a  random  variable  exponentially  distri¬ 
buted  with  mean  p. 

(ii)  A  total  service  effort  is  supplied  at  the  rate  where  <p,(«,)> 0  if  r//>0. 

(iii) A  proportion  y ,(/.«/)  of  this  effort  is  directed  to  the  customer  in  position  /.  (1  </ <//,).  When 

diis  customer  completes  service  and  leaves  the  queue,  die  customers  in  positions  /  t- 1./  f  2 . 

move  to  positions  IJ  /  I . //,  -  I  respectively. 

(iv) A  customer  arriving  at  queue  /  moves  into  position  /  (1  </<«,  /  I)  with  probability  £,(/.//,  <  i). 

Customers  previously  in  positions  1,1  f  1 . //,  move  to  positions  /  f  1./  r  2 . n,  /  I  respectively. 

The  amount  of  service  a  customer  requires  at  queue  i  is  assumed  to  be  independent  of  the 
amount  of  service  the  same  customer  requires  in  other  queues  and  independent  of  the  amount  of 
service  all  other  customers  in  queue  i  require.  Kor  example,  a  (  CI  S  queue  with  A  classes  of  cus¬ 
tomers,  each  class  with  Poisson  arrivals  of  rate  \(k).  k(LK  and  the  same  exponentially  distributed 
service  for  all  customers  can  be  described  by: 

jl.  /  I 

y{,  ,,)  V  i  2 . « 
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«(/.«) 


I  ,  l  ~n  t  l 
0  .  I  ~  l . n 


(p(u)-  1 

Quasi -reversible  queues  of  the  first  type  (also  called  generalized  M/M/-  queues),  can  be  described 

by  the  state  *(/)-(«. c(l) . c(n))  where  n  is  the  number  of  customers  in  the  queue  and 

<•(/).!</<«.  is  the  class  of  die  customer  in  die  llh  position  of  the  queue.  The  state  x (/)  is  a  sta¬ 
tionary  Markov  process  with  steady-stale  probability 

17  Hc(j)) 

1  ~~7 v  ’ 

j - 1  MP(/> 

where  k  is  a  normalizing  constant  |K1|. 

The  steady-state  probability  of  die  non-Markovian  state  jrV)  (/i(I).»(2) . n(K)),  where 

n(k)  is  die  number  of  customers  of  class  k  in  the  queue,  can  be  found  by  considering  all  possible 
ways  of  arranging  n  customers  in  k  classes. 


7V  <pU)  «<Q!  «('2)l  •  «(/f)! 


„  n( l)„  «(?.)  „  n(K) 

T,  PI  P2  '  ■  Pk 


Mk) 

Pk  ~ - 

P 

finally,  the  steady-slate  probability  of  the  non-Markovian  state  x,l(t )-(«),  can  be  found  by 
summing  v  ,  over  all  possible  ways  to  arrange  n  customers. 


ft 


d  -  2  — - - P1"(,)p2"(2) •  •  •  Pkn{K)- 

A  V(j)  n(l>,*<2>,  •  «(•)!  «(2)!  •••  «(*)! 


(where 


«(l)  i  n{2)  +  •  •  •  ^  n(K)-n 


t  t  ■■  ± 

nil)  0  n{ 2)  0  n( A  >=0 


such  diat  m(1)  /  //( 2)  f  •  •  /  w  ( A' )  /;  at  all  limes)  yielding 


^  rV  I  -v>  A(  A- 

*,«  -«p  II  ...  -  p  2.-- 

riT(i)  *  l  P 
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'1110  second  type  of  quasi-rcvcrsible  queues  is  described  by  Kelly  [Kl]  in  a  similar  manner. 
The  description  is  die  same  as  above  except  for: 

(i)  The  service  required  by  a  customer  is  a  random  variable  from  an  arbitrary  distribution  which 
may  depend  on  the  class  of  the  customer. 

(iv)Samc  as  above  except  die  symmetry  condition  Sf/.w,  +  \)=y(l,nj  ■*  1)  is  imposed  for  every 
/ -1 nt  + 1. 

Queues  of  this  second  type  are  called  symmetric  queues,  l-or  example,  a  server-  sharing 
queue  (essentially  a  round-robin  queue  with  infinitesimal  quantum  si/c  so  all  customers  arc  effec¬ 
tively  simultaneously  in  service)  can  be  described  by  y (/,«)-—,  1  =  1,2 . //;  /i>0.  and  <p(//)-l. 

A  last  come  first  served  (I.CFS)  queue  with  preemption  can  be  described  by  y(/,«)--|,  l-n, 
n  ~  1.2 . and  <p(n)~  1  for  «>0.  Finally,  an  infinite  server  queue  can  be  described  by  y(l,ii)-n, 

n>  1,  and  <»(//)-—,  /  =  1.2 . //;  //>  1. 

n 


Note  that  a  F’CI-S  queue  is  not  a  symmetric  queue.  ITtercforc  a  F'CF'S  queue  with  anything 
other  dian  the  same  exponentially  distributed  service  for  all  customers  (as  described  in  the  first 
type  of  quasi-rcvcrsible  queues)  does  not  fit  into  the  two  types  of  quasi-rcvcrsible  queues  just 
described.  Indeed,  such  F'CF'S  queue.;  are  not  quasi-rcvcrsible  since  the  departure  process  at  lime 
i  is  not  independent  of  die  suite  *(/)  after  /  (i.e.  given  the  slate  describing  the  customers  in  the 
queue  and  the  service  time  expended  on  (lie  customer  presently  in  service,  some  information 
about  the  next  departure  limc(s)  can  be  ascertained).  As  a  result,  no  product  form  solutions  arc 
known  for  such  F'CF'S  queues. 

As  for  the  generalized  M/M/-  queues  mentioned  earlier,  we  can  describe  a  symmetric  queue 
by  Mai  kov  process,  find  the  resulting  steady -state  probabilities,  and  then  sum  over  various  suites 
to  find  tile  steady-state  maiginal  probabilities.  Skipping  the  intcimcdiatc  steps  (which  follow 
directly  from  the  steady-state  probability  distribution  given  in  Kelly  [K 1 J),  we  have  for  the  non- 
Markovian  suite  X(l)~(n,c(l) . r(#j))  (>/  and  c(/)  are  as  defined  before)  the  steady  state  proba¬ 

bility  distribution 

pr  Mc(j))  ...  .  .  ... 
j  -  i  <K/) 

where  /.[z(fO))J  is  the  mean  service  requirement  of  a  class  c(j)  customer. 

For  the  non-Markovian  states  x'(i)  (n(l) . n(K))  and  xll(t)  (//).  we  get  the  same 

results  as  before  with  p*  and  p  now  as  follows: 

K 

pk  \(k  )/'(,’ (A  )]  p  ^  pk 

k  l 


a 
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The  only  feature  of  networks  of  quasi-revcrsiblc  queues  dial  has  not  yet  been  discussed  is 
die  routing  of  customers  within  die  network.  The  routing  is  formulated  as  follows:  upon  depart¬ 
ing  from  a  queue  a  customer  of  class  k  joins  class  /  with  probability  r*/.^  By  adding  a  sufficient 
number  of  classes,  routing  can  include  dependency  on  previously  visited  queues  and  classes  as  well 
as  on  the  initial  class.  I'or  example,  a  deterministic  route  can  correspond  to  each  input  class.  In 
addition,  routing  can  depend  on  quite  detailed  previous  history  (such  as  actual  service  times)  pro¬ 
vided  dial  die  next  class  depends  only  on  die  present  class  and  that  the  queues  remain  quasi- 
rcvcrsiblc  with  respect  to  die  classes. 

The  effective  arrival  rate  of  customers  of  class  k  to  die  queueing  network  is 

y’V(k)  \(k )  /  2^ <^(/)r«.  where  \{k )  is  die  arrival  rate  of  class  k  customers  from  a  source 
I 

external  to  the  network  (external  arrivals  arc  assumed  to  belong  to  a  Poisson  process).  ITic 
steady-state  probability  distribution  of  each  quasi-rcversible  queue  in  isolation  is  computed  assum¬ 
ing  die  die  arrival  process  of  each  customer  class  is  Poisson  with  rate  given  by  the  effective  arrival 
rate  of  that  class  in  die  network.  The  overall  steady-state  probability  distribution  of  the  network  is 
the  product  of  the  steady-state  probability  distribution  of  each  queue  in  isolation. 

In  the  steady  state  the  various  classes  of  customers  in  the  network  can  either: 

1.  form  closed  loops  with  no  arrivals  or  departures,  or 

2.  form  no  loops. 

(Closed  loops  with  arrivals  and  no  departures  and  closed  loops  with  departures  and  no 
arrivals  obviously  cannot  exist  in  steady-state.) 

If  all  classes  of  customers  form  no  loops,  dicn  die  effective  arrival  rates  arc  uniquely  defined 

by  (k )  =  \(k ) (I )rit .  In  diis  case  the  network  is  said  to  be  open  and  the  normalizing 
/ 

constant  in  the  product  form  equation  is  C  -l.  If  all  classes  form  closed  loops  with  no  arrivals  or 

departures  then  die  effective  arrival  rales  arc  given  up  to  an  multiplicative  constant  by 

k)  In  this  ease  the  network  is  said  ui  be  closed  and  die  normalizing  constant 

/ 

is  such  that  the  sum  of  all  probabilities  is  1.  Otherwise  the  network  is  said  to  be  mixed.  In  this  case 
y"(k)  is  uniquely  determined  for  those  classes  dial  form  no  loops  and  determined  up  to  a  con¬ 
stant  for  those  classes  that  form  closed  loops. 

We  conclude  this  section  on  product  form  solutions  by  noting  dial  die  same  results  have 

^  Departures  from  l!ie  net  wort:  can  Ik  handled  by  defining  a  certain  class  for  departed  customers  However,  it 

is  traditional  to  avoid  defining  a  explicit  class  for  departures,  resulting  in  >f  class  k  customers  can 

it 
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been  reached  by  others,  notably  Haskell  et  al  |B1],  by  close  examination  of  the  global  balance 
equations  in  the  method  of  stages.  I -or  certain  eases  these  global  balance  equations  reduce  to  local 
balance  equations  for  which  it  is  easy  to  determine  the  equilibrium  probability  distribution.  Kelly's 
treatment  via  die  quasi-reversibility  of  die  queues  generalizes  earlier  work  (distributions  with  non- 
rutional  Laplace  transforms  and  any  queue  fitting  die  description  given  earlier  for  generalized 
M/MA  queues  or  symmetric  queues  can  be  treated)  and  unifies  it  dirough  die  concept  of  quasi¬ 
reversibility. 


2.6.2  G/M/I//N  Model  as  a  Queueing  Network 

Ihc  G/M/1//N  model  can  be  considered  as  a  closed  queueing  network  with  a  l-'Cl-'S  queue 
with  an  exponential  service  time  distribution  -  same  mean  for  all  customers  -  and  an  infinite  server 
as  depicted  below: 


Processors 
General  service 


Figure  2.9:  Queueing  network  for  G/M/1//N  model 

All  customers  . ire  identical,  l  et  all  customers  in  die  infinite  server  queue  be  class  1  with 
mean  service  time  ip.  l  et  all  customers  in  the  FCFS  queue  be  class  2  with  mean  service  time  7a. 
I  bus  r | j  r;i  I  and  A‘^(  I )  Kach  queue  is  quasi-rcvcrsiblc  in  isolation.  Therefore 

from  section  2.6.1  we  have  for  the  infinite  server  queue  with  a  state  of  Jt|  -(« i): 

"i 

•  K|  ‘  .  pi  - \cff(Wp. 

1  n  i !  y 

For  die  I  Cl  S  queue  we  ha\c  for  a  suite  of  (ni): 

Kypi"' .  P2  yf/(2)iP. 

thus  tiit  the  o'crall  s'  *to  .1'  ( <  |.  i ?)  (u  \jn)'.  we  have 
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Since  n\  +  tii=N .  the  state  reduces  to  X  =(«*)  and  the  steady-stale  probability  distribution 
of  n  2  customers  in  the  KCFS  queue  is: 

=K7TT~-^  1^~1  •  0<«2<^. 


kxk^X'M  tp\ 

and  k  is  a  normalizing  constant  (  k  = - — - ). 

/V! 


Aside  from  the  change  in  notation,  this  equation  is  exactly  the  same  as  equation  2.1  in  sec¬ 
tion  2.4  for  die  steady-state  probability  of  «2  customers  in  the  M/M/1//N  system.  Therefore  both 
the  M/M/1//N  and  G/M/1//N  models  have  exactly  the  same  mean  waiting  times  per  request  if 
the  mean  processing  and  access  limes  arc  the  same  respectively  for  c;teh  model.  (  The  reader  is 
thus  referred  to  the  graph  for  the  M/M/1//N  ease  in  lieu  of  a  graph  here.)  Ihis  is  a  surprising 
result  considering  that  die  processing  time  distribution  is  arbitrary.  As  we  shall  see  in  die  next  sec¬ 
tion,  the  key  to  diis  behavior  is  the  exponential  distiibution  of  the  service  time  at  the  TCT'S  queue. 
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2.7  General  Processing  and  Access  l  ime  Distributions  •  G/G/I//N 

in  this  section  we  consider  the  full  generality  of  the  basic  model  studied  so  far.  Unfor¬ 
tunately.  the  G/G/1//N  model  is  difficult  to  solve  exactly.  We  no  longer  have  the  convenience  of 
memoryless  (i.c.  exponential)  processing  limes  as  in  die  M/G/1//N  ease  or  the  luck  to  have  a  pro¬ 
duct  form  solution  due  to  the  exponential  service  time  as  in  die  G/M/1//N  case.  Imbedded  Mar¬ 
kov  chains  and  supplementary  variable  mcdiods  arc  hopelessly  complex.  This  leaves  die  method 
of  stages,  as  complicated  as  it  may  be.  Of  course,  as  mentioned  in  section  2.5,  explicit  closed  form 
solutions  cannot  generally  be  obtained  with  the  method  of  stages.  Simulation  is  also  a  possible 
alternative.  However,  simulation  is  not  very  useful  to  systematically  determine  die  effect  of  vari¬ 
ous  parameter  changes,  so  we  leave  it  as  a  last  resort.  Approximation,  which  docs  not  sulTcr  from 
this  weakness,  is  perhaps  the  most  attractive  alternative  in  this  case.  Rather  than  pursue  a  lengthy 
investigation  of  approximation  techniques  for  the  G/G/1//N  system,  we  refer  the  reader  to 
llalachmi  and  Kranta  (111)  and  Whitt  [W2|. 

One  simple  way  to  approximate  the  solution  of  the  G/G/I//N  model  is  to  replace  die  I'CI-'S 
queue  by  cither  a  server-sharing  queue  or  a  I  Cl'S  queue.  Both  of  these  queues  are  symmetric 
and  the  processors  can  he  represented  by  an  infinite  server  queue  as  in  section  2.6.2.  Therefore 
both  queues  arc  quasi-rcversiblc  and  a  product  form  solution  exists.  In  fact  the  analysis  and  solu¬ 
tion  is  exactly  die  same  as  that  in  section  2.6.2!  Thus  this  approximation  gives  no  more  informa¬ 
tion  than  that  in  section  2.4.  (Actually  it  docs:  it  demonstrates  dial  under  different  service  discip¬ 
lines  die  G/G/I//N  model  has  very  simple  solutions.) 

2.7.1  Mean  Waiting  Time  in  1*1 1/1*1 1/ 1 //N  Model 

In  this  section  we  derive,  using  the  incdiod  of  stages,  a  solution  for  the  mean  waiting  time 
per  request  in  the  G/G/1//N  model.  Our  approach  is  to  relate  the  solution  of  the  G/G/I//N 
model  to  the  solution  of  the  G/G/I//(N-I)  model  (i.c.  the  same  model  -  same  processing  and 
access  lime  distributions  -  just  one  less  processor)  and  then  find  the  solution  by  solving  a  smaller 
problem  based  on  the  solution  of  die  G/G/1//(N-I)  model.  This  recursive  approach  was 
motivated  by  the  proof  of  Theorem  2.2  in  Appendix  B.  Herzog,  Woo,  and  Chandy  |II2|  have  out¬ 
lined  in  general  terms  the  solution  of  queueing  problems  by  a  recursive  technique  so  die  concept 
we  apply  is  not  new.  However,  we  have  not  found  any  references  in  the  literature  concerning 
recursive  techniques  specifically  applied  to  the  G/G/1//N  system.  General  motivation  for  much 
of  die  content  in  Unis  section,  such  as  the  block  partitioning  of  the  generator  matrix  and  the  I’ll 
distribution,  is  due  to  the  work  of  Nculs  [Nlj. 

'  Ncuts  lias  studied  continuous  time  Markov  processes  with  a  countably  infinite  number  of 

!  -.tales  where  the  generator  matrix  ^  has  the  following  (canonical)  block  matrix  form: 

I 

i 

|f  ^  The  noruli.n’on.il  elements  of  a  jvnenilm  nir.tnx  (2.  i  c  lot  i^j.  imtnaie  the  (i.i.imIuiii  rate  from  st.'ilc  ! 

lo  slate  j  in  the  associated  continuous  Mine  Mail.os  |hck\ns  I  he  diagonal  eVinetits  are  ^iveu  h> 
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where  all  matrices  arc  mXm.  NcuLs  [Nl]  has  shown  the  following  result  concerning  such 
processes: 

If  the  matrix  Q  is  irreducible  and  positive  recurrent  (explained  below),  then  the  sta¬ 
tionary  probability  vector  w  of  Q  when  partitioned  to  agree  with  the  partitioning  of  Q 

has  the  matrix-geometric  form: 

Vi  -vo^ '  0 

00 

where  llic  matrix  R  is  die  minimal  nonnegative*  solution  of  2  “0- 

*  -0 

The  matrix  Q  is  irreducible  if  the  system  has  no  independent  subsystems;  that  is,  if  all  sub¬ 
systems  interact  and  arc  dependent.  This  ensures  that  the  steady  suite  solution  (if  it  exists)  is 
independent  of  die  initial  suite.  It  is  usually  evident  by  inspection  or  construction  dial  Q  is  irredu¬ 
cible.  Requiring  dial  Q  be  positive  recurrent  is  essentially  just  requiring  that  the  process  is  suiblc 
(i.c.  the  queue  size  does  not  grow  indefinitely)  so  dial  a  steady  state  exists.  We  will  not  be  con¬ 
cerned  about  positive  recurrence  here  since  our  closed  system  G/U/1//N  model  will  have  only  a 
finite  number  of  states  and  we  will  assume  it  to  be  irreducible;  thus  the  corresponding  matrix  Q 
will  neccsu.rily  be  positive  recurrent. 

We  will  hypothesize  that  the  steady  suite  probability  vector  of  the  G/G/I//N  model  (when 
represented  by  the  method  of  sUigcs)  has  a  similar  matrix-geometric  form.  Our  G/G/1//N  model 
will  have  only  a  finite  number  of  states;  thus  our  approach  will  be  similar  to  but  different  than 
that  outlined  above  for  infinite  dimensional  systems.  Hie  key  aspect  of  Nculs'  result  is  die  matrix- 
geometric  form  of  the  steady  state  probability  vector. 

In  the  following,  we  will  use  die  phase  distribution  (denoted  by  I’ll)  originated  by  Nculs 
[Nl|.  The  PI  I  distribution  is  really  just  a  convenient  matrix  formulation  of  the  method  of  stages. 
(Indeed,  some  audiors  use  "phase”  instead  of  "stage”.)  This  formulation  provides  a  much  needed 
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—  2  flij  A  generator  matrix  Q  has  the  property  that  v  Q  -0  in  the  steady  stale  where  ff  is  the  vector 


of  steady  state  probabilities. 
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*  Minimal  in  the  sense  lhal  R  <,V  (element-wise)  for  any  othci  solution  X  V-  R  of  2  ^  -  ®- 
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structure  for  the  method  of  stupes  and  unifies  many  widely  disparate  formulations  of  Hrlangian. 
series/parallcl.  and  stage  tyj»c  distributions.  PI  f  distributions  are,  however,  a  subset  of  those 
obtained  by  Cox  |C4]  in  that  all  the  poles  of  the  I  aplace  transform  of  a  PI  I  distribution  are  real 
(as  opposed  to  the  complex  poles  allowed  in  Cox's  formulation).  This  restriction  to  real  poles 
allows  PI  I  distributions  to  be  directly  related  to  finite  suite  Markov  processes  and  ailows  them  to 
be  realizable  using  only  real  arithmetic. 

A  continuous  parameter  PH  distribution  l'(x)  on  |0,oo)  has  the  following  formulation: 


O  _  T  A 

U  ~  0  0  I 

where  T  is  a  mXm  nonsingular  (i.e.  invertible)  matrix.  7fl  is  a  in  X I  column  vector,  and 
Tc  -t  T0-  0  where  c  is  an  in X 1  column  vector  of  l’s.  The  matrix  Q  represents  the  generator  of 
a  in  /-l  state  Markov  process.  The  transition  between  any  suite  /  €  1.  2.  ■  •  •  ,  m  and  state  j  G 
1,  2,  •  •  •  ,  in ,  j*i,  is  governed  by  an  exponential  distribution  with  rate  7", , .  Similarly,  the  transi¬ 
tion  between  any  state  i  C  1,2,  •••,/»  and  state  in  +  1  is  governed  by  an  exponential  distribu¬ 
tion  with  rate  T/°  ( Tti  --  —  ( 7^°  +  The  states  1,  2,  •  •  •  ,  in  arc  transient  and  stale  in  i  1  is 

j*i 

absorbing.  The  initial  probability  vector  is  (a,am  ,  t)  where  o  is  a  1  Xm  row  vector  and  «,  is  tl'.c 
probability  of  sui  ting  in  phase  /.  (  nc  ha,n  ,  y  --  1.)  i  tic  random  variable  x  is  defined  as  the  time 
until  absorption  in  the  above  Markov  process.  The  distribution  of  x  is  /'(jv)---  l  ~actx e  ,  x>0. 
The  pair  (a.T)  is  called  the  representation  of  l'(x)  and  the  dimension  of  the  square  matrix  T  is 
called  the  order  of  /'(.t). 

As  an  example,  a  tliird  order  Hrlangian  distribution  ( /•/ 3)  can  be  formulated  as  a  PI  I  distribu¬ 
tion  as  follows: 

Hrlangian: 


Suite  I 


Singe  2 


Siagc  3 


l-ach  stage  has  an  exponential  distribution  with  rate  /i 
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PM  distribution: 


P 

P 

0 

[0 

A  °  ---•  0 

I  l 

0 

P 

P 

«  =  1  0  0 

0 

0 

-p 

P 

1  1 

We  now  consider  die  G/G/1//N  model  where  the  processing  time  distribution  is  I’ll  with 
representation  (a.T),  order  m,  and  am  +  \  - 0  and  the  access  lime  distribution  is  Pll  with  represen¬ 
tation  order  v,  and  (iv+\--G.  Ihc  states  in  the  resulting  PI l/PI l/J //N  model  can  be 

described  by: 

(h.v,/, . (N) 

where  //  is  tlie  number  of  requests  queued  for  or  in  service,  0 <n<N\  s  is  the  current  phase  of 
the  service  (i.e.  access  time  distribution/.  I  <.v  < v ;  /(  is  die  current  phase  of  the  processing  at  pro¬ 
cessor  i,  1  </,<«/:  and  s  and  /  are  simply  omitted  (or  taken  to  be  zero)  when  there  is  no  request 
in  service  or  when  processor  /  is  idle,  respectively. 

N 

This  gives  a  total  of  mN  1  ^  VII,J  states.  Since  all  the  processors  arc  assumed  to  be  identical, 

j  -0 

we  can  reduce  the  number  of  states  by  considering  the  suite  description: 

- Pm) 

where  pj  denotes  die  number  of  processors  in  which  die  piocessing  is  in  phase  i,\<i<m , 
0</i,  </V  -  n ,  ^ Pj  N  -  n ,  anj  n  and  s  arc  as  before.  This  gives  a  total  of 


/V  r  m  -  1 

tti  —  1 


/  - 1 
iV  - 1  . 

>=0 


N  -j  +  m  1 
in  - 1 


f  v  states. 


As  an  example,  consider  die  l  'i/i'i/  1//N  system  with  N  -  3.  I  lie  state  transition  diagram 
for  the  system  is  given  in  figure  2.10.  the  corresponding  generator  matrix,  if  the  slates  arc  labeled 
in  lexicographical  order  (i.e.  in  older  (0.0,0.3).(0.0.1.2).(0.0.2.l).(0.0.3 .()).( 1 . 1.0.2). 
( 1 , 1 . 1 . 1 ).( 1 , 1 ,2.0),(  1 ,2.0.2),(  1 .2, 1 . 1 ).( 1 ,2.2.0).( 2. 1 .0. 1 ),  <2,1,1.0).(2,2.0.1).(2.2,1.()).(.U.0.0).(.U.0.0)  ).  is 
given  in  figure  2.11.  Notice  the  block  lridiagon.il  form  of  Q.  A  process  having  a  matrix  Q  of  this 
form  is  called  a  quasi-birth  death  (QBI))  process.  I  igtirc  2.12  shows  the  generator  matrix  for  the 
general  ease  of  a  PII/PII/1//N  system  with  the  processing  lime  distribution  of  order  2.  die  access 
time  distribution  of  order  2.  and  N  3.  Again,  note  the  block  tridiagonal  form  of  Q. 


Vv.V  '/.V z.v.  f. 
Vv -r. 


I 


vv 

V  N.  V 
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Figure  2.11:  Generator  matrix  for  l'i/i'i/\//l  example 
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Figure  2.12:  Generator  matrix  for  rih/l'll 2/I//3  system 
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If  we  label  the  states  in  the  same  lexicographical  order  in  the  general  ease,  then  we  obtain 
the  generator: 


«0 
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0  . 
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A  1 
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A  l 
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A)  II)  . 
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fiN~\ 

('n-  1 
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0  . 

An 

Hn 

where: 

Hi  is  a  square  matrix  of  dimension  v|^  ~  *|  denoting  the  transition  rates  between 

suites  with  i  requests  in  the  queue; 

Aj  is  a  -/'  ?;-,|xv(,)|A'w/:/,w1  matrix  denoting  the  transition  rates  from  states 
with  /  requests  in  the  queue  to  suites  with  /  -I  requests  in  the  queue  (v(/)  - 1  if  i  l  and  v 
otherwise); 

and.  Cj  is  a  ij'^  '\  ^ matiix  denoting  the  uansition  rates  from 

suites  with  i  requests  in  (lie  queue  to  suites  with  i  v  1  requests  in  the  queue. 

More  details  about  these  matrices  will  be  given  later  as  necessary.  We  partition  the  steady 

suite  probability  vector  w  (given  by  v_Q  -  0  and  -  0  "ito  die  sectors  »<),  n  \ . match* 

i 

ing  the  partitioning  of  Q.  Hie  steady  suite  equations  arc  now: 

ILqRq  +  H. \A  t=  0  (2.6) 

wy_  )Cj  _j  +  VjRj  +  */  +  \Aj  ^i-0  .  0<i<N  (2.7) 

*\ - 1 ( n  - 1  + ?/V  "n  0  (2.8) 

One  way  to  solve  ilicsc  equations  is  to  adapt  Neills'  matrix -geometric  approach.  Since  the 
matrices  are  now  functions  of  /.  consider  a  rate  matrix  (hat  is  a  function  of  /  i.e  K(t).  and  guess 
that  n,  has  the  form  n,  2.1  •  A’(<  l)A(').  Substituting  this  expression  for  w.  into 

the  steady  suite  equations  we  obtain 

?N  i(f  s  i  *  'V)//,v)  0 

*,  i(f  ,  i  *  «(!)«,/  *  D.i, . ,)  o .  ix, <\ 


r,(tn0*  A(IM|)  o 
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If  the  appropriate  inverses  exist  we  have: 

*<*>--<  jv-i**'1 

R (/)—  - Cj(ltj  f/f(/  +  IMi  +  ,rl  .  0<i<N 
R(\)=-H0Ai "* 

A  solution  technique  by  iterative  substitution  is  now  apparent.  This  particular  matrix-product 
approach  is  again  -  to  the  best  of  our  knowledge  -  new.  However,  it  is  a  rather  infeasible 
approach,  the  main  difficulty  is  posed  by  finding  the  inverses  of  the  various  matrices,  For  large  /V 
and  even  just  small  values  of  /«  and  i\  the  dimension  of  /i,  for  small  /  is  very  large,  implying  that 
large  dimensional  matrices  must  be  inverted,  binding  the  inverses  of  large  matrices  is  computation¬ 
ally  very  inefficient.  Furthermore,  the  inverse  of  a  sparse  matrix  is  usually  quite  dense.  Therefore 
it  is  difficult  to  use  any  sparsity  present  in  the  A,.  //,,  and  ( j  matrices  to  reduce  the  computational 
requirements  in  any  of  the  other  matrix  operations.  The  non-sparsity  also  implies  large  storage 
requirements.  Another  difficulty  is  posed  by  the  varying  dimensions  of  all  the  matrices  involved: 
even  R(t)  has  a  si/e  that  is  a  function  of  /.  This  makes  any  practical  implementation  difficult  and 
complex  since  the  solution  of  each  /»’(/)  is  essentially  a  special  case,  f  inally,  a  great  deal  of  work  is 
required  for  the  solution  with  N  processors  (A  t  1  matrix  inverses  and  many  matrix  multiplies  and 
adds)  and  i1  must  all  be  repealed  if  we  also  want  the  solution  for  N  r  1  processors. 

I  he  key  idea  in  this  section  is  die  following  simple  observation. 

Some  oi  the  steady  slate  probabilities  of  die  C»/Ci/1//N  system  uic  related  by  a  multiplica¬ 
tive  const. ml  to  die  steady  state  probabilities  of  die  same  system  with  one  less  processor  (i  o. 

(,/(i/l//(N  I)  »  Specifically .  lot  our  I’l  l/TH/l  '/N  system  with  state  {n,s.p\.pi . /»,„)  and 

steady  suite  probabilities  tor  V  piocessois  denoted  by  w(.V Hu ,s ./> | . p„)  we  have: 

»(  V  Xu  i  r  I-  I'n)  f  w(  V  IMu  l.i./M  I’m )•  2<u<A 

wIkic  (  is  a  t  oif.i.iitl  I  h. •  [iiikiI  ill  this  o  laimn  lot  the  general  case  (i  e  loi  the  <  i/(  I/I//S  sys 
tern)  is  got  u  duni.g  the  pi i mi  nl  Iheun  ci  ’  in  \ppciulix  II 

Ihiiclme  1  wi  dei’i  ae  i  mi  i  ai  Ik  i  p.uliiionctl  sit  .ul  stale  vet  loi  as  w  (wo  .  ir  v  )  h>f 

tlia  I’ll .  I'l  I  'I  / /\  svslem  a .  let  w%  »  lm  the  ideiitn.il  svslem  wall  mic  less 


1,'t 
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vtftto  +  V  C A  I  '  0 


(2.10(a)) 


Vt)(  0  +  1L\tt\ ~  ■7r^2  r  V \  *  A  2 


(2.11(b)) 


Assuming  dial  Q  is  irreducible  (which  we  will  assume  in  the  rest  of  Lliis  section),  equations  2.10(a) 


and  2.10(b)  represent  ^ ^ '  +  v  ^ m"—\  ^  linearly  independent  equations  in  the  same 


number  of  unknowns.  If  Q  is  irreducible,  all  the  rows  of  Q  arc  linearly  independent,  and  tints 
/)0_1  exists,  yielding 


WQ  ~  -  71?  A  i#o  *  iin<l  I  ~~  A  |/io  lCQ)=  -Cv{*  *d;> 


finally  (/)  t  -  A  \  //0  'f  o)  1  also  exists  if  Q  is  irreducible,  yielding 


ir/V--CEiV  ,/l2(«,-/l,/*o',for 


Let  denote  the  steady  state  probability  of  /  requests  in  die  queue.  That  is.  -•nfi-e  where  e 
denotes  a  column  vector  of  l’s  of  appropriate  dimension.  Then  the  constant  C  can  be  determined 


by  die  requirement  dial  ^7r/v-  1.  Therefore  we  now  have  a  recursive  formulation  for  determin- 
i  =0 

ing  »/v,  0</'</V,  for  any  /V>L  The  solution  for  ,V  =  I  can  be  (bund  by  solving  n_iA  \  =  0 

and  woC'o  f  wj1// 1  -0  where  />o  is  inXtn,  A  i  is  //(Xf,  f  q  is  vXtn,  and  //|  is  i>X»\  (Note  that  the 
dimensions  of  the  matrices  arc  functions  of  A.)  If  Q  is  irreducible  we  have 
ir_\-  ir(J(/)|  d|/)(f'('o)  '  "here  the  inverses  exist.  In  addition  we  have  w/'C  t Ho'i  “1. 

yielding  atJ( /  -(/I !  -  A  \Hq  ’f  'o)  Vf  "  1.  This  equation  is  easy  to  solve  for  reasonable  values  of 


;//  and  v. 


The  mean  waiting  time  for  any  N  can  be  determined  by  applying  l  ittle's  Law  twice,  as  in 
section  2.3,  to  yield 


(*2(1  ^)*/v 
i  i 


(2.11) 


c  2*/v 


Since  the  normali/ation  factor  for  the  w,  cancels  out  of  equation  2.11.  it  is  not  necessary  to  deter- 


v 

mine  the  constant  (  in  equation  2.‘)  if  die  v ,  are  just  being  used  to  compute 


In  avoid  the  eoniputalional  melTiciencies  ,isv»  i.ited  with  the  m.iliix  inversions  and  to  retain 
(lie  advantages  alliudcd  by  sparse  matrices  it  is  best  to  solve  equations  2 .10  and  2.1  I  using  Gaus¬ 
sian  elimination  oi  (  muss  Si, ait !  ileiaiion  I  lie  //,(  malnx  is  vmv  sparse  In  the  billowing,  the  stale 
i  oi  uspi  mding  n  <  o  >w  ■  is  iK  ami,  <1  In  ( n 1  v  /< ) ,  .  )  and  I  lie  slate  coi  lespmiding  to  coin  mil  / 


....  ... 

--  \  ,\  s  \  s 


'V's' 


84 


Multibus  Models 


is  denoted  by  (nKsKp{ . Pm  )•  I '-lenient  (ij)  of  /to  is  given  by: 


i)  i*j ■  Uh)ij  = 


pjTik  if  n1  -n1  -0,  s'  =sj.  rind  /  and  k  arc  die  unique  values  (if  any)  such  that 
Pq  -ft/',  for  every  q*l,k,  and  p{-p\-  1>0,  pi- pi  t-\<N 
0  otherwise 


a )  i  =j-  Uhh  =  -  2("o)/y  -  2«^o)y 

i*j  i*j 


'ITic  C o  matrix  is  in  general  not  as  sparse.  Klcmcnt  (ij)  of  Cq  is  given  by: 


(C  0)y  = 


Pl*Tl°/3k  if  n‘  =0,  n’  =  1.  s1  =0,  s’  =k .  and  lis  die  unique  value  (if  any)  such  that 
Pq  - Pq •  f°r  every  q  *1  and  p^pf-  I>0 
0  otherwise 


ITic  // 1  matrix  is  again  very  sparse,  l-lcment  (ij)  of  !t\  is  given  by: 


i)  /*./: 


pjTtk  if  n‘  -  1,  s' -s',  and  /  and  k  arc  die  unique  values  (if  any)  such  that 

Pq  ~ P<i<  f°r  every  q*l,k .  and  pj-p\-  1>0,  p(=pk  t-l<N  -\ 

Sru  if  n1  =  nJ  1,  s1  =1,  sJ  -u,  p'q -pi 
0  otherwise 


where 


ii)  i  =j,  (/?,)„  =  -  £(//  ,)/y  -  JjA  | )u  -  2(C'l)d 
i*j  i*j  i*j 


piI'.0 


if  n'  =  1,  n’  =2,  s'  =  s’,  and  lis  die  unique  value  (if  any)  such  that 
p!i  -Pq~  f‘ir  every  q  *1  and  =  pj  —  l>0 


0  otherwise 


The  A  |  matrix  is  given  by: 


S|°a  ifn  i  -  1.  n’  =0.  s'  -  I, s’  =0.  and  lis  the  unique  value  (if  any)  such  that 


(/»!>/;-- 


Pq  ~  Pq-  f‘,r  every  q  *k  and  pi  =Px  +  •  <N 


0  otherwise 


ITic  sparsity  of  all  these  matrices  depends  on  the  exact  form  of  die  phase  distributions  for 
the  processing  and  access  times.  In  die  special  ease  of  Krlangian  service,  the  matrix  A  is  very 
sparse.  The  matrices  still  have  large  dimens.ons  for  large  V  hut  now  we  can  efficiently  employ  the 
sparsity  of  the  matrices  to  reduce  both  the  computational  and  storage  requirements. 
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There  arc  three  drawbacks  to  the  recursive  approach  described  to  determine  the  mean  wait¬ 
ing  time.  First.  as  just  mentioned.  Lite  matrices  arc  still  large  for  large  N.  iTirihcrmorc.  the  size  of 
the  matrices  is  still  a  function  of  <V.  Second,  one  cannot  obtain  the  solution  for  A  processors 
without  investing  the  work  to  determine  the  solution  for  l,  2,  3 . and  N  1  processors.  Some¬ 

times  this  is  a  convenient  built-in  advantage.  For  instance,  in  this  thesis  we  have  continually  been 
interested  in  the  solution  for  1,  2,  3 . N  processors  so  a  recursive  solution  based  on  the  solu¬ 

tion  for  N  - 1  processors  is  not  a  hindrance.  In  fact,  the  recursive  solution  is  very  efficient  in  a 
case  like  this  since  no  extra  work  is  performed.  'Hurd,  as  with  all  recursive  computational  pro¬ 
cedures,  small  numerical  errors  propagate  very  well  throughout  the  chain  of  calculations. 

As  a  final  remark,  the  recursive  method  really  amounts  to  solving  equations  2.6,  2.7,  and  2.8. 
It  just  happens  that  the  intermediate  results  solve  the  same  problem  for  smaller  N . 


Mull i Hus  Models 


86 

2.8  Multibus  Model  with  I  on*:  Word  Accesses 

We  now  extend  die  model  of  the  isolated  Multibus  considered  so  far  to  include  long  word 
accesses,  as  discussed  when  the  processor  model  was  introduced,  l  ong  word  accesses  are  modeled 
as  follows:  at  the  end  of  the  processing  time  interval,  the  processor  decides  with  a  probability/? 
dial  its  memory  access  will  be  a  long  word  access  and  with  a  probability  I  fl  that  its  memory 
access  will  be  either  a  word  or  byte  access.  I  he  probability  /?  is  assumed  identical  for  all  proces¬ 
sors  and  independent  of  the  suite  of  all  other  processors  and  memory  access.  A  long  word  access 
actually  requires  two  successive  word  accesses  on  the  Multibus.  With  the  Multibus  system 
employed  in  Concert,  there  is  an  interval  of  600  to  700  nanoseconds  between  these  two  accesses 
during  which  the  processor  releases  control  of  the  bos  to  any  pending  requests,  because  of  die 
round-robin  arbitration  on  the  Multihus.  all  the  pending  requests  arc  served  before  the  second 
access  of  the  long  word  access.  Ihcreforc  a  long  word  access  is  essentially  two  independent 
accesses:  600  to  7(H)  nsec  after  the  first  word  access  is  completed,  the  request  for  die  second  word 
is  generated  and  joins  tin  end  of  the  queue  for  Multibus  service. 


Price's*  >rs 


f  igure  2.1J:  basic  Multibus  model 


figure  2.14(a):  fxtended  Multibus  model 
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Completed 

access 


llylc.  word,  or  Recovery 

Him  word  of 
long  word  access 


Second  word 
of  long  word 
access 


figure  2.14(b):  Class  transition  diagram  of  extended  Multibus  model 

llic  basic  Multibus  model,  depicted  in  I'igurc  2.13  above,  can  be  extended  to  include  long 
word  accesses.  Iliis  extended  Multibus  model  is  depicted  in  I'igurc  2.14(a).  Note  that  the  circle 
labeled  "processing"  denotes  .ill  die  processors  which  arc  processing  and  the  circle  labeled 
"recovery"  denotes  all  the  processors  which  arc  recovering:  these  circles  do  not  denote  individual 
processors,  f  igure  2.14(b)  shows  a  class  transition  diagram  of  the  model.  The  details  of  the  model 
arc  as  follows. 

l  et  the  request  for  a  byte.  word,  or  die  first  word  of  a  long  word  access  from  any  processor 
i  <!</<, V)  be  represented  by  a  customer  of  class  1  Upon  completion  of  this  access,  the  class  1 
customer  becomes  either  a  class  2  customer  with  probability  I  p  or  a  class  3  customer  with  pro¬ 
bability  P  Class  2  customers  represent  fully  completed  memory  accesses  byte,  word,  and  long 
word  (both  word  accesses)  •  and  class  3  customers  represent  half  completed  long  word  accesses  - 
only  the  first  word  access  completed.  Upon  receiving  a  class  2  customer,  processor  i  begins  pro¬ 
cessing  and  after  a  time  period  ip.  governed  by  the  processing  time  distribution,  processor  /  gen¬ 
erates  another  request,  represented  as  a  class  1  customer.  Upon  receiving  a  class  3  customer,  pro¬ 
cessor  /  waits  a  recovery  time  ir  (a  random  variable  given  by  a  recovery  time  distribution)  before 
generating  a  class  4  customer,  representing  die  request  for  die  second  word  of  a  long  word  access. 
Upon  completion  of  litis  second  word  access  (all  word  accesses  .nc  governed  by  the  same  access 
time  distribution),  die  class  4  customer  becomes  a  class  2  customer  and  returns  to  processor  i. 
f  exactly  A/  customers  are  always  somewhere  in  the  closed  loop  of  classes  1,2,3.  and  4. 

Conceptually  there  is  no  difference  between: 

Method  I:  die  processor  deciding  when  it  generates  a  request  that  the  request  corresponds 

to  a  long  word  ;tcccss,  and 

Method  2:  the  server  deciding  when  it  completes  a  word  access  that  the  access  corresponds 

to  a  long  word  access  (and  hence  requires  a  second  word  access).  (This  method  is  depicted 


-V- 

vS 

v-’v 


.  A' 

C<v: 
■?: 
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in  Figure  2.14.) 

In  method  2  there  is  no  need  to  distinguish  between  byte  or  word  accesses  and  the  first  word 
of  a  long  word  access.  Method  2  therefore  requires  one  less  class  per  processor  than  method  1. 

The  processing  time  random  variable,  tp,  at  each  processor  is  assumed  to  be  identically  distri¬ 
buted  for  all  processors  and  independent  of  all  other  random  variables.  I  he  recovery  time  ran¬ 
dom  variable,  ir ,  at  each  processor  is  also  assumed  to  be  identically  distributed  for  all  processors 
and  independent  of  all  other  random  variables.  Finally,  the  access  time  random  variable,  for 
each  byte  or  word  access  is  assumed  to  be  identically  distributed  for  all  such  accesses,  irrespective 
of  class,  and  independent  of  all  other  random  variables. 

2.8.1  Analysis  of  Model  with  I  eng  Word  Accesses 

2.8. 1.1  Asymptotic  Behaviour 

For  sufiicicnlly  large  A/  the  bus  will  constantly  be  in  use.  yielding  a  bus  throughput  of  -J- 

1(1 

word  .tee esses  per  unit  time.  Since  each  processor  cycle  (processing  lime  plus  word  or  long  weird 
memory  access)  requires  an  average  of  I  /  /?  word  accesses,  we  obtain  the  throughput  balance 
equation: 

,2., 2, 

!<)< 

where  Kyv  is  the  average  cycle  time  given  by: 

Icyc  Ip  I  I  la  !  P^lr  ^  lw ,  +  la  )•  (2.13) 

/H,i  is  the  average  waiting  time  for  a  byte  or  word  access  or  the  first  word  access  of  a  long 
word 

and  /„2  is  the  average  waiting  time  for  the  second  word  access  of  a  long  word. 

In  general  since  the  waiting  time  of  the  second  word  of  a  long  word  access  is  corre¬ 

lated  with  die  waiting  time  of  the  first  word.  For  any  particular  long  word  access  we  have 
”i  I  i'(i  + 1  ) 

w  j  a  t’ 

r(0,  ln  tr )  where 

/  -- 1 

n,  is  the  number  of  requests  joining  the  queue  after  a  request  (for  the  first  word  of  a  long 

"i 

word)  during  the  waiting  time  tn  of  that  request 

i 

»(,  ,  ,  \  is  the  number  of  requests  joining  the  queue  during  the  actual  access  time  and 
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recovery  time  (/„  /-  ir)  of  the  request  (for  the  first  word) 

and  t(l'  denotes  a  particular  sample  of  the  access  time  distribution. 

n,  >  +  ir)  is  the  total  number  of  requests  which  arrive  after  the  request  for  the  first  word 

but  before  the  request  for  the  second  word  of  a  long  word  access.  The  quantity  n,  is  related  to 

*i 

lw  and  thus  tW{  and  ln,2  arc  correlated.  In  particular,  ^  =  0  only  if  all  requests  that  arrived  in 
<wx  +  Ui  +  *r  arc  completely  served  in  time  tr.  Certainly,  iW}~0  is  in  general  more  difficult  to  attain 
the  larger  lw ]  is  -  i.c.  tw  -0  is  in  general  a  stricter  requirement  than  Thus  we  expect  /W| 

and  tw  to  have  different  probability  distributions. 

The  mean  total  waiting  time  (or  wasted  time)  per  processor  cycle  is  7W.=7W  +pjW}.  Manipu¬ 
lating  the  equations  2.12  and  2.13  we  have: 

Tw.r -0  +P)Ma  ~ip  -0  +PK  ~Pir 

If  we  normalize  7Wj.  by  the  mean  word  access  time  7a,  we  have 


i\vr 

■---  -  (1  +ft)N  -  «-(l  +P)  -py  (2.14) 

ia 


iP  tr 

where  as  before,  and  y-  liquation  2.14  describes  a  function  of  ,Y  with  an  asymptotic 

ip  ia 

slope  of  1  +  p  and  a  knee  at  1  +  ■  I  he  effect  of  the  long  word  accesses,  through  the 

1  +p 

parameter  /?,  is  to  increase  die  asymptotic  slope  compared  with  die  ease  with  only  word  accesses. 
The  knee  increases  with  p  if  y>a  and  decreases  with  p  if  y<a. 

Normalizing  instead  by  the  mean  memory  access  lime  7m  =  7a  +p(lr  •/  7a)  yields: 


S  _ /V  _  _  a 

7„  !  f  _Py_  (\+P)+Py 

\+p 


(2.15) 


As  a  function  of  N ,  — —  has  an  asymptotic  slope  of 


U 


Py 


which  is  always  less  titan  or  equal 


to  1,  r.ad  a  knee  again  at  1  + 


a  +py 


1 +P 
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2.8. 1.2  Detemiiiiislic  Behaviour 

Consider  now  the  case  when  lp,  t and  lr  are  deterministic  quantities.  I'he  maximum 
memory  access  time  is  2/„  /  ir.  Regarding  litis  as  die  access  time  and  proceeding  as  in  section  2.2 


we  obtain  iw  —  0  for  N  < 


_p 

2ta  +  if 


1.  In  the  actual  Multibus  0< /,.</„  (see  Appendix  A).  Tak¬ 


ing  0 <ir<ta  here,  we  find  that  queueing  must  occur  (i.c.  /*  >0)  for  N> 


2la  +  !r 


i  1  when 


p>0.  I'he  reason  that  7-  >0  under  these  conditions  is  that  no  request  can  be  completely  served 
in  the  recovery  time  (since  ir  < ia ),  thus  in  order  to  maintain  7W  0  only  one  request  can  be 


served  in  the  entire  2ia  -t  tr  interval.  However,  this  is  impossible  for  N> 


>r 

2ta  1 r 


/  1,  hence 


some  requests  must  occasionally  wait.  The  ease  with  /?  -  0  reduces  to  dial  discussed  in  section  2.2, 


for  which  no  queueing  occurs  until  N> 


f  1. 


In  die  actual  Multibus  0<lr<lp  (see  Appendix  A),  thus  0 <ir<lp.  We  can  view  die 
recovery  time  tr  as  a  shortened  processing  time.  Thus  die  processing  time  is  tp  with  probability 
I -(}  and  ir  widi  probability  ft  (with  the  restriction  that  one  processing  time  of  ip  follows  every 


processing  time  of  ir ).  When  /V> 


f  I  anu  p- 0.  we  know  from  section  2.2  diat  the  bus  is 


always  busy.  The  following  theorem  shows  dial  die  bus  is  in  fact  always  busy  when  .V> 


regardless  of  die  value  of  p. 


ITicorcm  2.5 

Consider  the  Multibus  model  with  long  word  accesses  described  in  the  beginning  of  section 


2.8.  If 


1)  tp  and  tr  arc  deterministic  variables  such  dial  fi</r</p, 

2)  la  is  a  random  variable  with  minimum  value  t„  .  >lr. 


3)  N> 


‘•-to 


+ 1,  and 


4)  each  of  the  N  processors  has  completed  at  least  two  memory  accesses  -  byte,  word,  or 
first  or  second  word  access  of  a  long  word 
then  the  fraction  of  time  that  the  bus  is  busy,  denoted  by  p,  is  I. 
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Proof: 


Suppose  to  the  contrary  that  p<l.  Then  there  must  he  at  least  one  memory  request 
such  th.it  the  Inis  is  idle  immediately  prior  to  that  request.  Choose  one  such  memory 
request.  Denote  die  time  at  which  diat  request  occurs  by  r  and  the  processor  from 
which  it  originated  by  k.  There  arc  two  cases  to  consider. 


Case  1:  At  time  r  processor  k  just  completed  a  processing  time  interval  (of  duration 
lp )• 


Immediately  prior  to  time  r  lp.  each  of  the  N  -1  processors  other  than  processor  k 
either  must  have  a  memory  request  pending  (and  waiting)  or  must  he  in  the  midst  of  a 
processing  or  a  recovery  period  (since  all  processors  have  completed  at  least  two 
memory  accesses).  Since  nil  of  these  processors  (if  any)  in  the  midst  of  a  pro¬ 

cessing  or  recovery  period  must  generate  at  least  one  memory  request  before  time  t. 
Therefore  there  must  be  at  least  N  -  1  memory  requests  pending  or  generated  in  the 
interval  (r  ip. t|.  In  order  that  the  bus  be  idle  immediately  prior  to  time  t.  all  of  these 
memory  requests  must  be  completely  served  before  time  t.  Since  there  arc  at  least 
N  1  of  these  memory  requests,  we  must  at  least  have  (N  1  )i„mn<tp.  Or,  since  ,V  is 


an  integer,  we  must  have 


N< 


7-1. 


Case  2:  At  time  r  processor  k  just  completed  a  recovery  time  interval  (of  duration  /r) 


Since  the  bus  is  idle  immediately  prior  to  time  t  and  .  there  can  be  no  memory 

requests  pending  or  generated  in  the  interval  [t  -  ir. t).  furthermore,  no  memory 
requests  can  be  pending  or  generated  in  die  interval  (t  lr  ,t).  otherwise  the  bus 

"inin 

would  not  be  idle  immediately  prior  to  time  r.  In  order  that  there  be  no  memory 
requests  in  the  interval  (t  /,  ,t),  all  the  other  N  1  processors  must  be  process- 

ine  during  the  interval  (t  t,  l,,  ,t).  Ill  us  each  of  these  N  I  processors  must  begin 

min 

processing  in  the  interval  (r  lp. t  !r  r„  J,  implying  that  at  least  <V  2  memory 
accesses  occur  in  the  interval  /,  i„  I.  Therefore  at  least  /V  1  memory 

•  mm 


accesses  occur  in  the  interval  [t  ip.t),  i.c.  (.V 


N< 


h  1. 


1  )i„  <1,,.  Or  since  N  is  an  integer. 

min  1 


from  Case  I  and  2  wc  conclude  that  .V  < 


that  p<l.  Since  by  hypothesis  <V> 


/  I  is  a  ncccssiiry  condition  in  order 
t  t,  wc  must  have  p  I. 


tl.'tl.  f .'i*.  ill  i>  »*t  *  1.  , 


l*»  !*■  4' .  (>,  f< 


vV.i 
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Our  throughput  balance  equation  (equation  2.  12)  can  be  written  for  general  p  as  follows: 


0  ^)N  p 


Of  * a 


"r  'p 

Wc  conclude  from  this  that  equals  its  asymptotic  value  for  N>  +1  since  p  I  for  N  in 

>m  'a 

tJiis  range. 

figure  2.15  illustrates  representative  cases  of  i„r/lm  vs.  N  in  the  deterministic  case. 


t  / 1 

*  j-  m 


f  igure  2.15(a).  ft  0 
Knee:  <r  /  1  Asymptotic  slope:  1 


f  igure  2.15(b):  /)  1 


Knee:  n  *  1  Asymptotic  slope: 

<  •’  '  y)  |  t  y 

1 


s  s 
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Figures  2  IM.t)  .m.l  3  15(h)  depict  foi  /?  0  .mil  fi  I  respectively.  (  V  is  treated  as  a 

l>r, 

tr 

continuous  p.iianietei  tr.  I  igiae  2  I*',  (hue  [Ik  IIihh  (mictions  arc  neglected.  Note  also  th.it  a 

la 

l, 

and  y  )  lor  /?>()  vie  have  three  cases: 


t.  lor  V  < 


lor 


h 

2'„  1  >r 


>  ! .  /„  0, 

*/ 


1 

.  ..  J',l 

i  is.vsi  1 

2/„  '  b 

1  •«  1 

in  Nil iv.ii>  |n»Niii*c  lii'isj  greater  r»r  ocj >  to  its 


asymptotic  value  (assuming  /„>()).  and 


3.  lor  \  > 


t  I,  equals  its  asymptotic  value. 


I  hese  three  cases  are  illustrated  in  Figure  2.15(c).  = 


1 p 

<  ir 


f  1  and  Nu~ 


>  1. 


i  / 1 
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Figure  2.15(e):  0</3<  1 

,,  or  /  /?v  .  .  .  .  1 

Knee:  -  -  c  1  Asymptotic  slope: - - — 

1  >P  j  ^  /*Y_ 

W/3 

The  curves  in  Figure  2.15(c)  arc  rounded  in  the  knee  area  due  to  the  randomness  introduced  by 
the  probabilistic  choice  of  word  vs.  long  word  access,  because  of  litis  rounding,  the  knee  cannot 
always  be  interpreted  as  the  maximum  value  of  A'  for  which  tWf.  0  can  be  maintained. 

Finally,  note  that  deterministic  tp ,  ir.  and  i„  yield  a  lower  bound  on  /  over  all  possible  sia- 
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tionary  processing,  recovery,  and  access  time  probability  distributions  for  all  N  if  /?  -  0  or  I  and  at 

least  for  JV<— —  +  1  and  N>a  /  l  ifOC/ICl. 

~2-ty 

18.1.3  Product  Form  Solution 

The  Multibus  model  with  long  word  accesses  dial  was  presented  earlier  has  a  product  form 
solution  if  die  access  time  is  exponentially  distributed.  Hie  processing  and  recovery  lime  distribu¬ 
tions  may  be  completely  arbitrary. 

let  the  global  state  be  X  (*/•.!')  where  */»  represents  the  suite  of  the  processors  (where 
class  I  customers  originate)  and  y  represents  the  suite  of  the  I  CI  S  queue  for  Multibus  service. 
The  processors  can  be  considered  as  comprising  an  infinite  server  since  Uierc  is  always  a  free  pro¬ 
cessor  available  for  an  arriving  customer.  Ihcrcforc  the  processors  form  a  quasi-rcvcrsiblc  queue 
(with  respect  to  a  Markovian  suite  description).  The  exponentially  distributed  access  time, 
independent  of  class,  renders  the  I  CTS  queue  quasi-rcvcrsiblc  (again  with  respect  to  a  Markovian 
suite  description).  Ihc  quasi-reversibility  of  all  the  queues  in  isolation  yields  the  product  form: 

I  ct  x/>  -  (»!>.» h)  where  »/>  is  the  number  of  customers  in  class  2  (i.c.  processing)  and  nK  is  die 
number  of  customers  in  class  3  (i.c.  recovering). 

l  et  y  -  where  iiA[  is  the  number  of  customers  in  class  I  (i.c.  byte  or  word  or  first  access 

of  long  word)  and  iia  is  the  number  of  customers  in  class  4  (j.c.  second  access  of  long  word). 

l  et  \jM  represent  the  effective  anival  rate  of  class  j  customers;  j  1 . 4.  Then  from  the  results 

in  section  2.6.1  we  have: 


(Xf",,)"'  IXf^)" 


1  1 


Xf*f'|xt®|*'. 


Now  X+H p\ I'-f/iind  Aj^-d  ~P)\\^+ X^  - X\H.  Thus  the  steady  suite  proba¬ 


bility  of  llic  global  suite  X  (np,nK’HAi’nAj)  <s 


H/’Ulft  IiIaIha* 


Since  n i>  i  n R  f  n A f  i  n /t )  A I,  we  can  rewrite  this  as: 


''.'•.'•s W-,r.  ■ 
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•x^r  ^  anF(fiy)nK(nA .  +  "a  V-ft  2 . 


for  some  normalizing  constiint  C  (C  = - - - ). 

/V! 

'1110  mean  number  of  requests  for  a  byte,  word,  or  the  first  word  of  a  long  word  access  in  the 
PCI'S  queue  is 


=  £  "if,  £  £  ^  *X  up  +  ng  +nAl—N  -i 


nA  =0  n,,  -0  =0  =0 

Similarly,  the  mean  number  of  requests  for  tlic  second  word  of  a  long  word  in  the  Feb'S  queue  is: 


-  it  5  ^  »x  «i*  +  hr  +  —  N  - nAl 


n A ^=0  ne=  0  »j( -0  ^  =0 


Clearly  ha[-,>a1  when  fi-l  and  »a2-0  when  fi-0. 
If  we  let  the  global  suite  be  then 


(N-*A-*Afl  (  »ax+»a2)'- 


$ 
‘»y« 

I 

I 

i*Z» 

MS' 


1 


% 


Multibus  Models 


in,  "  M 


;rjV!V  |_ fi  A*  y 2 _ y"A>  ’  "A?': _ I  1 

Ol  "  +  fly  nft I  («4  1  /V- ,  'M2M  I  «  A/*Y 

*1  "l  ‘ 


<«V  »^,)l 


nA-C"N\ 


p 

1 

nA 

*1 

a+py 

a+Py 

By  renaming  nA  and  nAi  in  the  above  expression  for  nAj  we  see  that  .  as  one  might 


have  expected  (naively)  from  the  ouLscL 

If  we  let  X  -(up. hr  jis)  where  ns  is  the  total  number  of  requests  in  die  FCI’S  queue,  then: 

x  «/*!  nK\  «/»!«#!  (/V-n,)! 

Finally,  if  we  let  X  v  -(/;*),  then 


vyV=Cv{a+Py)N — -- — 

*  (N  --«,)!  a+Py 


=cvl— —0-  ' 

V v  “«,)!  a^py 


where 


CV!=  4  N\  \+P  ’ 

„=o  (»-»*)'■  <*+fly 


Note  that  this  is  exactly  the  same  result  we  obtain  in  section  2.4  for  the  M/M/I//N  model  if  we 


X  \  f  B 

replace  —  by  - -  -  ratio  of  mean  service  requirement  per  cycle  to  mean  processor  time  (pro- 

H  a  +  Py 


cessing  plus  recovery)  per  cycle. 

Ihc  average  number  of  requests  in  the  queue  is 


—  r’  yi  4  1  +  fi  ‘  N\ 

ik-(  2j  — v~  - 

„  =o  (N-»s)'- 
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"  I  _  _  w"2  - 

By  Little's  I  .aw:  lWf  -  •  /«.  i*2~ — j fj  ~  ta,  and  ihe  mein  waiting  time  for  any  access  is 

\\JJ  \+J 

7W-  S‘nCC  X4//r  ^Xi//  and  ‘•‘A~  -P"Ar  we  have  ~7a 

7W  p7w 

where  (In  general  /„  -  -  -  —  f  — —  .) 

(l+p)  (I-/-0) 

It  is  possible  to  arrive  at  7Wl  =  JWj  (and  hence  via  a  simpler  route.  A  closed  net¬ 

work  of  quasi-rcvcrsible  queues  has  the  property  that  at  the  instant  a  customer  arrives  at  a  queue 
the  probability  distribution  of  all  other  customers  is  the  same  as  the  equilibrium  distribution 
obtained  if  they  were  die  only  customers  in  the  network  (Kelly  [K  l|).  An  arriving  customer  essen¬ 
tially  "sees"  die  network  as  it  would  behave  in  equilibrium  without  itself.  Therefore  class  1  and 
class  4  customers  arriving  at  die  T'CT'S  queue  each  "see"  die  queue  as  it  would  behave  in  equili¬ 
brium  with  N  -  1  customers  -  each  see  die  same  distribution  of  customers.  Both  classes  of  custo¬ 
mers  dtus  have  die  same  waiting  time  distribution. 

Denoting  the  probability  dial  the  T'CT'S  queue  server  (i.c.  die  Multibus)  is  busy  by  p,  we 
have,  again  by  Little's  I  .aw,  7a.  The  bus  utilization  p  is  given  by  p  =  1  -fw.  Therefore 


A*.  -  lw,  —  hv 


--  - 1  -  — 

P  0 


c-'  -  4  |  I ±g_ 

-CVl)  *~Y'V  (Af  -  »7H~  I  ~a+Py 


'This  is  die  same  result  as  obtained  with  die  M/M/1 //N  model  when,  as  just  noted  above,  we 

u  a  +  By  iir 

replace  a~^  in  die  M/M/1//N  model  by  — 'Ihcrcforc  — -  is  asymptotic  to 

N  -  ~  1  for  large  N  (at  least  in  the  ease  when  all  processors  arc  identical  and  all  the 

I  / p 

queues  arc  quasi-rcvcrsible  in  isolation).  Since  in  this  ease  we  know  that  7W  =  7W^  -  it  is  easy  to 
confirm  this  asymptotic  behaviour,  liquating  dirougliputs  for  large  A  we  have: 

_ ( \+fi)N _ =j_ 

<P+7„^7a+p(7r  +  7Wi  +  7a)  7a 


lyy  0  /  Ry 

and  thus  —  -N - -  -  1,  for  large  N  as  deduced  by  comparison  with  die  result  for  the 

ta  ]+P 

M/M/1 //N  model. 

Since  Ut=7w  ^iw  the  mean  total  waiting  time  per  processor  cycle  is  simply  /„  (I  /  P)iw. 
Although  iW/  is  more  meaningful  than  l,v  as  an  indication  of  throughput  degradation,  we  choose 
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to  give  the  results  in  terms  of  /H  for  three  reasons,  l-'irst.  as  just  mentioned  the  two  arc  trivially 

7W 

lclatcd  by  a  multiplicative  constant.  Second,  lw  or  more  specifically,  _  unifies  die  results  of  the 

current  model  with  the  results  of  the  earlier  models  and  facilitates  direct  comparisons.  Iliird,  the 
asymptotic  slope  of  7W  is  independent  of  all  parameters  except  A/,  unlike  the  ease  with  7W/..  Ihus 
graphical  results  for  7W  can  be  presented  without  the  possible  clutter  created  by  asymptotes  inter¬ 
secting. 

Actual  measurements  (see  Appendix  A)  indicate  that  7a  =  1.04  /iscc  for  reads  and  1.06  /iscc 
for  writes  and  that  7r  =.65  /iscc.  Taking  7o-1.05  /iscc  and  7^  —  .65  /iscc  yields  y=.62.  The 
minimum  possible  value  for  lp  is  .60  or  .70  /iscc  with  almost  equal  probability:  thus  a>.62.  Fig¬ 
ure  2.16  shows  7w/7a  vs.  N  for  various  combinations  of  «>.62  and  0</J<l  with  y  .62.  Note 
that  with  p~ 0  the  model  reduces  to  the  G/M/I//N  model. 

The  mean  waiting  time  per  request  is  very  sensitive  to  the  value  of  ft.  Indeed,  since 

=  --- — -  r-  <0  (since  a>y).  the  knee  of  ~  varies  from  /  1  to  a  /  I,  which 

U+P)2  "  la  2 

represents  close  to  a  100%  change  in  7W  (with  respect  to  7W  for  /)--!)  for  large  a. 

2.8. 1.4  Simulations 

In  tli is  section  we  explore  the  ease  when  the  access  time  is  not  exponentially  distributed  and 
thus  the  solution  docs  not  (in  general)  have  the  convenient  product  form  as  in  the  previous  sec¬ 
tion.  As  demonstrated  in  section  2.7.2.  exact  results  could  be  obtained  by  the  method  of  stages. 
However,  this  method  requires  substantial  work  and  docs  not  yield  great  insight.  Approximate 
results  could  be  obtained  by  a  diffusion  model  as  in  Halachmi  and  F'ranta  [111]  or  by  the  methods 
discussed  and  referenced  by  Whitt  [W2].  While  such  approximate  results  can  yield  a  great  deal  of 
insight  they  arc  more  difficult  to  obtain  in  this  ease  -  due  to  the  complexities  added  by  long  word 
accesses  -  than  in  section  2.7  and  they  arc,  of  course,  just  approximate. 

In  order  to  obtain  a  qualitative  understanding  of  the  effect  of  different  processing  time  dis¬ 
tributions  on  the  mean  wailing  time  per  request,  we  simulated  the  system  with  different  a  and  p 
parameters  for  different  processing  time  distributions,  t  he  access  lime  distribution  was  kept  deter¬ 
ministic  throughout  to  approximate  the  actual  Multibus  access  time  distribution.  The  error  in  this 
approximation  is  presumably  quite  small  since  the  variance  of  the  actual  access  time  is  small  (sec 
section  3.3  in  Appendix  A).  The  results  from  all  the  previous  models  lead  us  to  conjecture  that  the 
mean  waiting  time  for  a  given  privessing  lime  distribution  and  a  given  mean  access  time  is  minim¬ 
ized  by  a  deterministic  access  time.  I  lius  the  mean  waiting  lime  w  ith  (lie  actual  access  time  distri¬ 
bution  will  likely  only  be  greater.  The  recovery  time  distribution  was  also  kept  deterministic 
throughout. 


d  a  J  Py 
dp  \ip 


Figure  2.16:  Queueing  network  model  of  Multibus  with  long  word 
Processing  lime:  general,  access  time:  exponential,  y=.62 
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'Ihrcc  dilTerent  processing  time  distributions  were  considered :  third  order  I'rlangian  (i.c. 
/■}).  exponential,  and  third  older  hyperexponential  (with  parameters  at  - .0  a2  .3  -.1  and 

a|  o?  ar 

X|  A,  A2--.1A,  Aj  .001A  where  — t  I  he  chief  difference  between  these  dis- 

A|  A2  Ay 


tributions  is  in  their  cocnicient  of  variation  defined  as 

r 


1110  coefficient  of  variation,  C,  , 

p 


is  a  measurement  of  the  amount  of  variation  or  randomness  about  the  mean  normalized  by  the 
mean.  The  following  table  gives  C,  for  the  three  distributions  considered. 

P 


Processing  time  distribution 

Coefficient  of  variation  ( \ 

p 

ITIangian  (/'.'}) 

vr-5771 

I'xponential 

1 

!  lypercxponcnli.il  (// y.  parameters  as  above) 

V  i(Os.U78 

The  simulation  results  for  a  1.0.  5.0,  10.0  and  p  0.  .5.  1.0  are  presented  in  f  igures  2.17. 
2.18  and  2.19.  Note  that  the  vertical  axis  is  the  mean  waiting  time  per  access  for  any  access  -  i.c. 
the  first  or  second  word  of  a  long  word  -  denoted  by  /*.  We  found  in  general  that  /„ 
where  lw.  and  iw  are  the  mean  waiting  times  for  the  first  and  second  word  respectively.  The 

i  . 


difference  tWi ■  /li[  increased  with  A'  ar.d  approached  a  constant  as  the  mean  waiting  time 


+  L 


•i  "2 


approached  its  asymptotic  value  (interestingly,  iw~  — -  — ).  These  findings  are  consistent  with 


the  discussion  in  section  2.8.1. 1:  the  waiting  time  for  the  second  word  of  a  long  word  access  is 
correlated  with  the  waiting  time  of  the  first  word  of  the  same  long  word  access. 


Figure  2.17:  Queueing  network  model  of  Multibus  with  long  word  accesses 
Processing  time:  Erlangian  (£j),  access  time:  deterministic,  y=.62 


Figure  2.18:  Queueing  network  model  of  Multibus  with 
Processing  time:  exponential,  access  time:  deterministic. 
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A  careful  examination  of  figures  2.17.  2.18.  and  2.19  reveals  that  for  any  given  a  and  (i  the 
curves  differ  only  in  the  knee  area.  In  each  ease,  the  mean  waiting  time  in  the  knee  area  ts  least 
for  die  frlangian  distribution  and  greatest  for  the  hyperexponential  distiihulioii.  11ns  finding  is 
consistent  with  our  findings  with  the  previous  m<xlcls:  the  mean  waiting  time  in  the  knee  area 
generally  increases  as  the  "randomness"  of  the  (processing  and  access)  distributions  incrc.iscs.  In 
each  ease  however,  the  change  in  the  mean  waiting  lime  due  to  the  different  processing  time  dis¬ 
tributions  is  much  less  than  the  change  due  to  different  values  of  the  parameter  /).  l-'or  example, 
for  «•--- 10.0,  the  frlangian  curve  is  at  most  about  .2  below  the  same  curve  for  the  exponential,  and 
the  hypcrcxponcnlial  curve  is  at  most  about  .5  above  the  same  curve  for  the  exponential. 

lbc  difference  in  mean  waiting  times  effected  by  exponential  versus  deterministic  distribu¬ 
tions  for  the  access  time  can  be  ascertained  by  comparing  figures  2.5  and  2.18.  Ibe  difference  in 
mean  waiting  times  is  greatest  in  die  knee  area  of  the  curves  and  increases  with  N,  as  observed 
with  die  earlier  models,  for  a-- 10.0,  the  difference  is  at  most  about  .70.  Changing  /?  from  .5  to 
1.0  results  in  a  change  of  at  most  about  1.5  in  the  mean  waiting  time.  Therefore,  for  the  distribu¬ 
tions  considered,  die  mean  wailing  time  is  more  sensitive  to  the  value  of  [i  dian  die  form  of  the 
distribution.  Indeed,  the  value  of  / 3  determines  the  asymptotic  value  of  the  mean  waiting  time  and 
the  location  of  the  knee  in  the  mean  v. ailing  time  curve.  Ibe  processing  and  access  time  d:stribu- 
tions  just  determine  the  "sharpness"  of  die  knee. 

The  above  discussion  suggests  dial  it  is  best  to  study  the  factors  influencing  Jic  parameter  /?, 
while  perhaps  assuming  analytically  tractable  exponential  distributions  for  the  processing  and 
access  times,  before  studying  in  detail  the  effect  of  different  distributions. 
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2.9  Multibus  Model  with  I  one  Word  and  Ringbus  Accesses 

In  tins  section  I  lie  Multibus  model  discussed  so  far  is  interfaced  with  the  Ringbus.  As 
dcsiiibcd  hi  section  1.2  5  wc  have  decomposed  the  ovei.ill  Concert  model  m»o  two  model;  -  the 
Multibus  and  the  Ringbus  -  to  make  analysis  trictuhlc.  When  aualy/ing  one  model,  the  operation 
of  the  oihei  is  repl.iccd  by  an  equivalent  lumped  model.  In  this  section  wc  replace  the  Ringbus  by 
its  equivalent  access  time  distribution.  In  the  sequel  we  will  be  interested  in  approximating  the 
Ringbus  access  lime  distribution  by  one  with  a  small  number  of  parameters  (in  particular  a  single 
parameter)  so  that  wc  can  easily  solve  for  the  interaction  between  the  Multibus  and  Ringbus 
models,  for  now  we  consider  the  Ringbus  access  time  distribution  to  be  general  and  unspecified. 

Wc  can  extend  the  Multibus  model  with  long  word  accesses  that  was  developed  in  section 
2.X  to  include  Ringbus  .kccsscs.  We  regard  a  Ringbus  access  .is  occurring  with  probability  >p  and  a 
Multibus  access  as  occurring  with  probability  1  if-:  otherwise  the  model  remains  as  in  section  2.8. 
Actually,  any  Ringbus  access  begins  as  a  Multibus -access.  The  Ringbus  interface  board  (Kill) 
determines  which  Multibus  .icccsscs  are  permitted  to  use  ihc  Ringbus  based  on  the  address  at 
which  the  read  and/or  write  is  to  be  performed.  Recall  from  section  1.3  that  we  term  a  memory 
operation  -  read  and/or  write  -  that  occurs  in  the  Ringbus  address  space  (i.c.  requires  the 
Ringbus)  a  Ringbus  access.  Similmly.  wc  c.JI  a  memory  operation  that  occurs  in  the  Multibus 
address  space  (i.c.  docs  rot  require  any  portion  of  the  Ringbus)  a  Multibus  access.  Thus  a 
Ringbus  jcccns  requires  mastership  of  the  Multibus,  but  liie  actual  access  occurs  in  the  Ringbus 
address  space. 

I  hc  new  model  can  be  described  more  precisely  by  introducing  classes  of  customers  as  in 
section  2.8.  Wc  now  require  a  total  of  seven  classes;  the  classes  1  through  4  arc  the  same  as  in 
section  2.8. 


figure  2  20(a):  Multibus  model  vviih  Ringbus  accesses 
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Multibus  access: 
byle,  word,  or  first 
word  of  a  king  word 


Recovery 


Kinghos  access: 
second  word  of 
a  long  word 


Kingbus  access: 
byte,  word,  or  first 
word  of  a  long  word 


Recovery 


Kingbus  access: 
second  word  of 
a  long  word 


Figure  2.20(b):  Class  transition  diagram 

Figure  2.20(a)  depicts  flic  new  model  and  Figure  2.20(b)  shows  a  class  transition  diagram.  As  in 
Figure  2.14  in  section  2.8.  the  circle  in  Figure  2.20(a)  labeled  "processing"  denotes  all  processors 
which  arc  processing  and  the  circles  labeled  "recovery"  denote  die  processors  which  arc  recover¬ 
ing.  l*hc  details  of  die  model  arc  as  follows: 

l.ct  the  request  for  a  byte,  word,  or  die  first  word  of  a  long  word  access  from  any  processor 
be  represented  by  a  customer  of  class  1  for  a  Multibus  access  or  by  a  customer  of  class  5  for  a 
Kingbus  access.  After  a  class  1  customer  completes  its  access,  it  becomes  cidier  a  class  2  customer 
with  probability  1  ~p  or  a  class  3  customer  with  probability  /?,  and  returns  to  any  free  processor 
(all  processors  arc  considered  identical).  Class  2  customers  represent  fully  completed  memory 
accesses  -  byte,  word,  and  long  word  (both  accesses)  -  and  class  3  customers  represent  half  com¬ 
pleted  long  word  Multibus  accesses  -  only  the  first  word  completed.  Upon  receiving  a  class  3  cus¬ 
tomer.  a  processor  waits  a  time  tr  given  by  the  recovery  time  distribution  before  generating  a  class 
4  customer,  representing  die  request  for  the  second  word  of  a  long  word  Multibus  access.  Upon 
completion  of  this  second  word  access,  the  class  4  customer  becomes  a  class  2  customer  and 
returns  to  any  free  processor. 

With  probability  1  -if  diis  request  is  for  a  Multibus  access  and  is  represented  by  a  customer 
of  class  1;  with  probability  if  diis  request  is  for  Kingbus  access  and  is  represented  by  a  customer 
of  class  5. 
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After  a  class  5  customer  completes  its  access,  it  becomes  either  a  class  2  customer  with  proba¬ 
bility  l- ft  or  a  class  ft  customer  with  probability  ft,  and  returns  to  any  Tree  processor.  Class  ft 
customers  represent  half  completed  long  word  Ringbus  access  -  only  the  fust  word  completed. 
Upon  receiving  a  class  ft  customer,  a  processor  waits  a  time  !r  given  by  'die  same  recovciy  time  dis¬ 
tribution  as  before  and  dicn  generates  a  class  7  customer,  representing  the  request  for  the  second 
word  of  a  long  word  Ringbus  access.  Finally,  upon  completion  of  this  second  word  access,  die 
class  7  customer  becomes  a  class  2  customer  and  returns  to  any  free  processor. 

Customer  classes  5.  6.  and  7  arc  completely  analogous  to  classes  1.  3.  and  4  respectively, 
except  that  the  former  refer  to  Ringbus  access  and  die  latter  to  Multibus  accesses,  lixactly  N  cus¬ 
tomers  arc  always  somewhere  in  die  closed  loop  of  classes  1  dirough  7. 

As  in  our  previous  model,  the  processing  time  distribution  is  identical  for  all  processors,  and 
the  recovery  time  distribution  is  the  same  for  all  processors.  There  are  two  separate  access  time 
distributions:  one  for  Multibus  accesses  and  one  for  Ringbus  .kxcsscs.  The  Multibus  access  time 
distribution  is  the  same  for  all  byte  and  word  (first  or  second  word  of  long  word)  Multibus 
accesses  and  the  Ringbus  access  time  distribution  is  die  same  for  all  byte  and  word  (first  or  second 
word  of  long  word)  Ringbus  accesses.  We  denote  die  access  time  of  a  Multibus  access  by  the  ran¬ 
dom  variable  tamb .  and  the  access  time  of  a  Ringbus  access  by  the  random  variable  tlirb  ■  The  ran¬ 
dom  variables  ip,  tr,  illlb  are  each  assumed  to  6c  independent  of  other  random  v  ariables  and 
independent  of  all  classes. 

2.9.1  \nalysis  of  Multibus  Model  with  Long  Word  and  Ringbus  Accesses 


19.1.1  Asymptotic  Behaviour 

Hie  Multibus  throughput  is  now - r— - - - —  where  p  is  the  fraction  of  time  (i.c. 

(!  +  'P’aKH 

probability  in  steady  state)  dial  the  Multibus  is  busy.  The  throughput  balance  equation  is  thus: 


tlhJ±N  _  A 


where  l0  is  the  average  access  lime  given  by  ia  -  (I  ~^)r(jWW  t  tllKII  and  tcvc  is  die  average  cycle 
time  given  by  tcyc  =Tp  +  lWyM„+  ftUr  +  >w2  +  <a ) 

As  in  section  2.8,  lW[  is  die  average  waiting  time  for  a  byte  or  word  access  or  the  first  word 
access  of  a  long  word  and  /*.  is  die  average  wailing  time  for  die  second  word  access  of  a  long 
word.  Now  however,  /  and  /Hj  refer  to  die  average  waiting  time  of  both  Multibus  and  Ringbus 
accesses.  It  is  certainly  possible  to  partition  and  cadi  into  one  component  for  Multibus 


108 


rww*»x*wrwni*w*vunuriurRr**Air*juuin»urnjiiuwjiniuyu«u 


Multibus  Models 


;kxcsscs  and  another  for  Kinghus  jicccsscs.  but  we  choose  to  continue  looking  at  the  overall  wait¬ 
ing  time  per  request.  Note  that  in  general.  im*iw.  as  discussed  for  the  case  in  section  2.8. 

'I  he  mean  total  waiting  time  (or  wasted  time)  per  processor  cycle  is  /».=?*.  v/?rWj.  Combin¬ 
ing  this  equation  with  the  equation  for  7or  and  equation  2.17  yields: 


~b  -0  '  PK  -fib 


As  discussed  in  section  2.8.  we  choose  to  normalize  lw  by  the  mean  memory  access  time 


+  P(tr  -f  l„)  in  order  to  retain  our  earlier  interpretation  of  the  knee.  1)105 


A 

P 


\  + 


fiy 


fiy+U+m+MS-i)) 


—  l 


(2.18) 


(U/JKUtKf  D) 


.  'P  lr  ,  y 

where  a  --  — — .  y  =  — — .  and  $  =  — - 


V 

<aMH 


1 aMH 


<aMB 


As  a  function  of  A,  -  -  has  a  knee  at  — - - 


•m 


r  1  and  an  asymptotic  slope  of 


1 


\  + 


py 


•I Vr 

.  which  is  always  less  than  or  equal  to  1.  As  A'~»oo,  p-*l,  so  -—  is 


‘m 


(1 +/3X1  +  HS  - 1)) 

asymptotic  to  equation  2.18  with  p  =  1. 


2.9.1. 2  Deterministic  Behaviour 

Consider  now  the  ease  when  (p.  ir,  iumt, .  and  iort,  arc  deterministic  quantities.  The  max¬ 


imum  memory  access  time  is  2tarb+t,  (assuming  that  tnrt, > lam(, ).  Ihus  lWf.  -0  for 


A< 


In 


_ ’P_  _ 

2 larb  I  If 


f-  1 


2  Sty 


+  l  =  A/,  where  A/  corresponds  to  the  knee  when  P~  1  and  ^>=1. 


live  following  theorem  shows  tliat  the  bus  is  busy  when  A> 
of  the  value  of  p  and  A„*  corresponds  to  the  knee  when  p  =  0  and  ^  -  0. 


-  —  I  v  1  -  |  a  I  +  1 = Aj  regardless 
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'Hicorvm  2.6 

Consider  the  Multibus  model  with  long  word  and  Ringbus  accesses  described  in  die  begin¬ 
ning  of  section  2.9.  If  tp,  /aW#.  iaKli,  and  tr  are  deterministic  variables  such  that  ir<ip, 

*r>0.  and  N>  -  +1  and  if  each  of  the  N  processors  has  completed  at 

(aMH 

least  two  memory  accesses  -  byte.  word,  or  first  or  second  word  access  of  a  long  word  -  then  the 
fraction  of  time  that  the  bus  is  busy,  denoted  by  p,  is  1. 

Proof: 

Given  by  Ihcorcm  2.5  with  ta  =I„mr. 

min 

7„r 

From  Ihcorcm  2.6  we  conclude  that  equals  its  asymptotic  value  for  N  >  /Vu*  For 

•m 

L 

0  0  WT 

N/  <N  <NU  and  0</)<l  and/or  0<^<1,  -r—  is  striedy  positive,  again  by  an  argument  similar 
to  that  in  section  2.8. 1.2. 

I  he  direc  possible  cases  are  depicted  in  Figure  2.21.  As  discussed  in  section  2.8. 1.2.  the 
curve  in  Figure  2.21(c)  is  rounded  in  die  knee  area  due  to  the  randomness  introduced  by  the  pro¬ 
babilistic  choice  of  Multibus  versus  Ringbus  access  and  word  versus  long  word  access. 


(a):  /)  =0  and  \p=0 
Knee:  a  + 1  Asymptotic  slope:  1 


vv 
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19.1.3  Product  Form  Solution 

ITic  KCI-'S  queue  for  the  Multibus  is  no  longer  quasi-revcrsible  in  general,  since  the  service 
time  depends  on  the  class  of  the  customer;  Ringbus  accesses  may  have  a  different  service  time  dis¬ 
tribution  than  Multibus  accesses.  Certainly,  the  l'CI-'S  queue  remains  quasi-revcrsible  if  ^-0  and 
the  Multibus  access  time  distribution  is  exponential  or  if  ip  - 1  and  the  Ringbus  access  time  distri¬ 
bution  is  exponential.  'ITic  analysis  in  either  of  these  two  eases  is  the  same  as  in  section  2.8.  How¬ 
ever,  we  arc  interested  in  the  general  case  when  0<^<  1.  Since  the  I'CI-S  queue  is  not  quasi- 
revcrsible  for  0<^<1  (unless  the  Multibus  and  Ringbus  access  time  distributions  arc  identical), 
we  cannot  use  the  product  form  results  in  section  2.6.1  to  give  an  exact  result  (no  product  form 
solutions  arc  known  for  non-symmclric  FCT'S  queues).  We  can  however,  find  an  exact  product 
form  solution  for  a  slightly  different  model  than  the  one  in  which  we  arc  interested. 


Consider  the  model  presented  at  the  beginning  of  section  2.9  with  general  processing, 
recovery  and  access  time  distributions.  Obtain  a  new- model  by  replacing  the  I'CI-S  queue  for  the 
Multibus  by  a  server-sharing  queue.  (A  server-sharing  queue  is  essentially  a  round-robin  queue 
with  infinitesimal  quantum  si/e  so  all  queued  customers  arc  in  service  simultaneously.)  Since  the 
server-sharing  queue  is  quasi-revcrsible,  this  new  model  has  an  exact  product  form  solution.  We 
will  now  derive  the  exact  solution  for  this  new  model  and  use  it  to  approximate  the  solution  of  our 
original  model  with  a  l-'Cl'S  queue. 


let  the  global  stale  be  X  -(x/>,y)  where  x/>  represents  tire  state  of  the  processors  and  y 
represents  die  state  of  the  server-sharing  queue  for  use  of  the  Multibus.  As  in  section  2.8. 1.2.  the 
processors  can  be  considered  as  comprising  an  infinite  server  and  thus  they  form  a  quasi-revcrsible 
queue  (with  respect  to  a  Markovian  state  description).  As  mentioned  earlier,  the  server-sharing 
queue  is  also  quasi-revcrsible  (with  respect  to  a  Markovian  state  description).  Ihc  quasi- 
reversibility  of  all  the  queues  in  isolation  yields  the  product  form: 


1  Let  j.«6)  and  y  -(« i./i 5;/* 4./1 7)  where  //;  is  the  number  of  customers  in  class  i.  l  et 

>  \fff  represent  the  effective  arrival  rate  of  class  i  customers.  Then  from  the  results  in  section  2.6.1 

I  we  have: 


(A (Kff>r)nt 


n2t 


n  3! 


nb\ 


n  1 !//  5!/!  4!//  7! 

Now  Af"  <1  O.vj"  and  A(-v  +A)" 
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Simplifying,  we  finally  obtain: 


'x  =■ 


(2.19) 


ip  i. 


where  n^--«2>  nx  =  ni  +  "t*  I  *  *5.  and  nA1~n4  +  «7-  Xs  before,  a-—  —  ,  y- 

!  1  'aMR  laMR 

tgJt/l 

and  f  -  — — .  Note  that  equation  2.19  is  exactly  the  same  as  equation  2.16  in  section  2.8. 1.3 

'aMR 

(*4  +  *4  ) 

except  for  the  (1-f  f  <f{)  1  1  term.  We  can  imagine  a  similar  term  in  equation  2.16.  i.c. 

(»,  +  *4  ) 

I  1  2 .  and  thus  both  have  cx.ictly  the  same  form. 

Using  the  results  of  section  2.8. 1.3  we  immediately  have: 

1)  the  steady-stale  probability  of  a  total  of  0<ns  <N  requests  in  the  queue  is 


Prub(ns  in  queue)- f' 


N\ 

In  -  «,)i 


(l  ^Xl 


a+0y 


where  C¥  is  a  normalizing  constant 

2)  fW|  — /Wj~/ iv 


3) 


C 


I  -c* 


A  _JV! 

„="’(N  -«,)! 


(1  +p*\-+nH) 
a  +Py 


-  1 


where 


>a  -0  '  'PVaMH  *  ^laRR 

Points  (1)  and  (3)  are  the  same  results  as  obtained  with  the  M/M/1//N  model  in  section  2.4 

(equations  2.1  and  2.2)  when  a  \  in  the  M/M/I//N  model  is  replaced  by - a— - . 

A  (1^X1  *>*f) 

tyy  0  /  Ry 

Therefore  with  a  server-sharing  queue.  is  asymptotic  to  N  -  -  —  f  for  large  N . 

la  (1  +  /*XI  + 


Point  (2)  implies tlxat  7W/.  (1  c /?)/*.  ITms 


(1  +  0) 


1  +p  h —  fi1 -  '» 


To  gauge  the  accuracy  of  the  result  for  Use  server-sharing  mode!  as  an  approximation  for  the 
original  model,  consider  the  Multibus  rntnlcl  with  long  word  accesses  in  section  2.8.  Ihc  product 
form  solution  of  this  model  is  exactly  the  same  for  l-CFS  and  server-sharing  disciplines  at  the  Mul¬ 
tibus  queue.  However,  the  product  form  solution  with  the  server-sharing  discipline  is  more 
comprehensive:  it  is  exact  for  general  distributions  for  the  processing,  recovery,  and  access  times 
(i.c.  it  is  not  limited  to  an  exponential  access  time  distribution  is  with  the  FCFS  discipline).  Since 
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the  product  form  solution  is  the  same  for  l  (_TS  and  server-sharing  disciplines.  the  simulations 
reported  in  section  2.8. 1.4  may  he  used  to  determine  the  accuracy  of  the  solution  tor  the  server- 
sharing  discipline  in  approximating  the  solution  for  the  I  t  I  S  discipline.  We  reach  the  same  con¬ 
clusion  as  in  section  2.8. 1.4:  the  approximation  is  excellent  except  in  the  knee  area  and  in  general, 
Tw*I„2.  In  the  knee  area.  Jw  with  the  server- sharing  discipline  is  loo  large  for  some  processing 
time  distributions  (those  with  <  I  it  seems)  and  loo  small  for  other  processing  time  distributions 
(those  with  C,  >1  it  seems).  Kxtrapolating.  we  expect  roughly  the  same  the  accuracy  of  our 
server-sharing  model  in  section  2.9. 1.3  as  an  approximation  for  the  original  Id'S  model. 

It  is  important  to  temper  the  previous  sentence  with  the  observation  that  we  arc  basing  our 
extrapolation  to  the  case  with  general  Ringbus  access  time  distribution  (of  possibly  large  variance) 
on  the  simulations  performed  for  deterministic  ttcccss  times.  However,  live  accuracy  of  the  server- 
sharing  model  will  likely  remain  very  good  except  around  the  knee  area  where  we  expect  the 
greatest  inaccuracies  to  accrue.  We  have  chosen  not  to  perform  any  simulations  to  determine 
further  the  accuracy  of  our  server-sharing  model.  The  reason  is  that,  as  in  section  2. 8. 1.3.  we 
expect  the  mean  wailing  time  to  be  more  sensitive  to  the  values  of  the  parameters,  such  as  (S  and 
\p.  than  the  exact  form  of  the  probability  distributions.  Therefore  it  seems  best  to  study  the  fac¬ 
tors  influencing  the  parameters  before  studying  the  effect  of  die  probability  distributions. 

2. 9. 1.4  A  Special  Case 

In  the  special  ease  when  the  processing  time  is  exponentially  distributed  and  there  arc  no 
long  word  accesses  (i.e.  ft  -0)  an  exact  result  for  the  average  waiting  time  per  request  can  be 
obtained  from  the  M/G/1//N  results  in  section  2.5.  Since  there  are  no  long  word  accesses,  we 
can  combine  the  Multibus  access  time  and  Ringbus  access  time  distributions  into  one  access  distri¬ 
bution.  Specifically,  if  the  Multibus  access  time  distribution  is  Prob( lamf,  < I )  - /• \;mb ( I )  and  the 
Ringbus  access  time  distribution  is  Prob(/Or£  <0-  I'arbi 0.  then  the  overall  access  time  distribution 

is  /•),(/)-  (1  -  4')l'amb (O  f  'P I'arbi1  )■  The  average  waiting  time  can  be  determined  by  applying 

oo 

the  formulae  in  section  2.5  with  /'*(.v )  —  fe  f\//-’(J(  jr). 

0 

2.9.2  Ihc  Single  Processor  (equivalent  of  the  Multibus 

As  discussed  in  section  1.2.4.  we  have  decomposed  the  overall  model  of  Concert  into  Mul¬ 
tibus  and  Ringbus  models  and  when  dealing  with  one  of  these  models,  we  replace  the  other 
models  by  equivalent  models.  Up  to  this  point  we  have  examined  the  Multibus  model:  we  have 
assumed  some  Ringbus  access  time  distribution  and  determined  the  performance  of  the  Multibus 
model  with  that  distribution.  Now  we  examine  the  single  processor  equivalent  model  of  the 


>TH.nrvi^,*TiTrci^  n 
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Multibus. 

The  single  processor  equivalent  of  the  Multibus  is  characteri/ed  by  a  processing  lime  distri¬ 
bution.  Ringbus  destination  probabilities,  and  fi  0.  ^  I  (as  discussed  in  section  1.2.4).  Ilie  pro¬ 
cessing  time  distribution  presents  the  most  difficulty  -  we  must  lind  die  probability  distribution  of 
the  Ringbus  spacing.^  'ITic  Ringbus  destination  probabilities  arc  trivial  to  determine.  Since  we 
have  assumed  that  all  processors  in  the  Multibus  model  arc  identical,  the  Ringbus  destination  pro¬ 
babilities  for  llic  entire  Multibus  arc  the  same  as  that  for  one  processor.  Thus  the  Ringbus  desuna- 
lion  probabilities  for  the  single  processor  equivalent,  denoted  by  p,MHcqvy  arc  given  by 
p  MHeqv  p()r  .(||  j  whcrc  ihc  Kingbus  destination  probabilities  for  each  processor  in  the  Mul¬ 
tibus  model  arc  denoted  by  pt. 

The  Ringbus  spacing  probability  distribution  is  vciy  difficult  to  find  in  closed  fonn.  Instead, 
we  choose  to  approximate  the  Ringbus  spacing  distribution  by  another  distribution  with  the  same 
first  moment.  We  could  also  use  higher  moments  in  the  approximation  of  die  Ringbus  spacing 
distribution,  thereby  achieving  greater  accuracy.  However,  higher  moments  arc  progressively  more 
difficult  to  obtain  from  die  Multibus  model.  We  therefore  choose  to  slick  with  our  simple  first 
moment  approximation  and  evaluate  the  results  before  considering  more  complex  and  accurate 
approximations.  Indeed,  die  results  so  obtained  may  be  sufficiently  accurate  that  more  accurate 
approximations  are  unnecessary.  To  ease  analysis,  we  choose  an  exponential  distiibution.  which  is 
completely  parameterized  by  its  first  moment,  to  approximate  the  Ringbus  spacing  distribution. 
Recall  from  section  1.2.4  dial  the  processing  time  probability  d’Shibudon  of  the  single  processor 
equivalent  is  equal  to  the  Ringbus  spacing  probability  distribution.  Ilius  we  have  just  approxi¬ 
mated  the  processing  lime  distribution  of  the  single  processor  equivalent  by  an  exponential  distri¬ 
bution.  l  et  the  mean  of  this  distribution  be  denoted  by  ip*,nrqv . 

Ihe  Ringbus  access  time  distribution  is  also  very  difficult  to  find  in  closed  form  (as  we  shall 
sec  in  Chapter  3).  For  the  same  reasons  as  above,  we  choose  to  also  approximate  die  Ringbus 
access  time  distribution  by  an  exponential  distribution.  Since  both  die  processing  time  distribution 
of  the  single  processor  equivalent  and  die  Ringbus  access  lime  distribution  arc  thus  completely 
specified  by  their  respective  first  moments,  integrating  the  Multibus  and  Ringbus  models  reduces 
to  first  moment  matching,  rather  than  die  (considerably)  more  difficult  task  of  matching  continu¬ 
ous  distributions. 


We  now  determine  die  mean  processing  lime  of  the  single  processor  equivalent  of  the  Mul¬ 
tibus  in  terms  of  Multibus  parameters.  Ihe  mean  time  between  initiation  of  Ringbus  accesses  is 


MHcqv 


+  tan  ft .  Ihis  mean 


tune 


is  also  given  by 


or 

1 - -  where 
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hi  section  I  2  4  »t  ill  fined  the  Kinflvis  sparing  lo  be  inc  interval  between  the  completion  of  one  access  on 


Multibus  Models 


115 


'cyc=^p  -I  /»•,-*  <„  '  PUr  Hw f,i)  and  /„  (1  - WaMU  ^'h„KH-  Thus  the  mean  processing  time 
of  the  single  processor  equivalent  is  given  by 


7  Mllrqv  _ 
lp 


(1  hfi)N  + 


<aRB 


(2.20) 


To  proceed  further  we  require  a  relationship  between  lWr  =  tW]  +pt„}  and  the  Multibus 
parameters.  We  choose  to  use  die  exact  results  for  the  server-sharing  queue  model  developed  in 
section  2.9.1.3  to  approximate  die  general  ease.  ITicrc  is,  of  course,  some  error  involved  with  this 
approximation,  but  at  least  we  have  a  convenient  result  expressing  the  relationship  between  7W 
and  the  Multibus  parameters.  As  discussed  in  section  2.9. 1.3.  the  server-sharing  queue  model 
should  give  fairly  accurate  results  for  lw  except  around  the  knee  area.  Substituting  die  equation 
for  icyc  into  equation  2.20  we  have: 


7  MHcqv  _ 
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where  —  is  the  mean  waiting  time  per  request  for  the  M/M/1//N  model  of  section  2.4  (equa¬ 


tions  2.1  and  2.2)  with  -  = - ^LtPjL - 

A  (U/1X1- -  +  +  W) 

For  small  N  ,  0,  and  dius  7  ~ ^ 


MB 


^Yjny  -  «‘-U 


MBeqv 


ip . is  approximately  linear  in  f  for  small  N.  For  large  N, 

la 

and  thus  7„  MneqVttiaMB  ~  --  a  constant. 

'P 


(a  +  P  y) 


(U-PKI -  +  +  +$) 


As  we  shall  see  in  section  3.9.1,  we  need  one  more  quantity  from  the  single  processor 
cquiv  .lent  of  the  Multibus  when  we  integrate  die  Multibus  and  Ringbus  models.  This  quantity, 
which  we  denote  by  Ppji  is  the  probability  that  at  the  termination  of  a  Ringbus  access,  the  Mul¬ 
tibus  queue  is  nonempty  and  die  request  at  the  head  of  die  queue  is  a  Ringbus  request.  In  other 

words. 


Prh  —  ProbpA  customer  departing  from  die  Multibus  queue  leaves  a  Ringbus  request 
at  the  head  of  the  queue  |  the  customer  departing  is  a  Ringbus  request) 

Ihc  Multibus  with  a  Ringbus  destination  ami  tile  Mail  of  the  next  such  access  on  the  Multibus 
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A  cK)scd  network  of  quasi- reversible  queues  has  the  property  that  when  a  customer  of  a 
given  class  arrives  at  or  departs  from  a  queue,  the  other  customers  in  the  system  arc  distributed 
according  to  the  steady-state  probability  distribution  obtained  if  they  were  the  only  customers  in 
the  system  [Theorem  3.12  of  Ref.  K1J.  Ihus  /'a#  in  the  server-sharing  queue  approximation  of  the 
general  ease  is  given  by  the  steady-state  probability  of  the  customer  in  service  -  i.c.  at  the  head  of 
the  queue  -  representing  a  Ringbus  access  in  a  /V  -1  processor  system.  (N  is  the  number  of  pro¬ 
cessors  in  the  original  system.)  We  denote  this  probability  by  p#/f '.  Let  p'v_l  denote  the 
steady-state  probability  of  there  being  any  customer  in  service  in  a  N  - 1  processor  system. 

Using  Little’s  law  we  have  pj}#~ 1  <md  PN  _l- From  section 


2.9. 1.3  we  have 


^  - 1  *  -  A  ~ 1  *  +  _  ~ 1  +  ~  *> 
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We  have  already  noted  that  the  server-sharing  queue  model  in  section  2.9. 1.3  has  the  same 
i..h  r...  «  a  np  \ a  /ka  t\  / /m  .... iti  P-  — _  _ _  '|*hc  same  holds 


solution  for  tw/ia  as  a  M/M/1//N  queue  model  with  ~  = 


A  (1*/?X 


for  the  probability  that  the  server  is  busy.  That  is.  pN  is  the  probability  that  Ute  server  is  busy 


in  a  M/M/1//N-1  system  with  — 77V-  ,'inaiiy-  l*BH  ~ 
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2.10  Kx  tensions 

The  Multibus  models  considered  so  far  have  four  main  weaknesses: 

1.  All  processors  arc  identical. 

2.  The  processor  model  is  very  simple,  perhaps  too  simple. 

3.  I’hc  processor  model  is  stationary,  i.c.  time  independent 

4.  All  processors  arc  independent 

We  assumed  points  1  through  4  in  the  previous  sections  to  obtain  simple  and  analytically 
tractable  models.  In  this  section  we  consider  extensions  to  relax  each  of  these  assumptions. 

2.10.1  Non-identical  processors 

This  case  is  straightforward  to  handle  by  simply  adding  more  states  to  die  Multibus  model 
to  represent  lire  different  combinations  of  non-identical  processors.  I  dr  example,  we  can  change 
the  slate  description  of  llic  M/M/I//N  model  in  section  2.3  from  (//),  where  n  represents  die 
number  of  requests  waiting  for  or  in  service  to  (/f,C|.C2.  •  •  •  ),  where  n  is  the  same  as  before 

and  Cj  represents  the  processor  from  which  the  i'h  request  in  the  queue  originated.  In  a  sense,  we 
now  have  /V  classes  of  customers  (for  A'  processors)  where  there  is  one  class  per  processor.  Simi¬ 
larly.  we  can  add  classes  to  die  Multibus  model  with  Ringbus  accesses  in  section  2.9  to  distinguish 
die  respective  processors  at  which  requests  originate.  For  example,  we  could  choose  the  classes 
7 (/ -1)  /-!,  7 (/  -1)-/  2,  •  •  • ,  l(i  -  1)^7,  1<<<'V,  where  /  denotes  the  originating  processor  and 
7 (/  -  I)  h  I,  7(/  -  I) 7  2,  •  •  • .  7 (/  -  1)7-7  represent  the  7  classes  (as  in  section  2.9)  associated  with 
die  originating  processor  /.  A  product  form  solution,  similar  to  dial  developed  in  section  2.9. 1.2, 
can  be  developed  with  respect  to  diese  classes. 

Since  the  processors  are  now  non-identical,  the  mean  waiting  time  per  request.  7W,  is  not 
necessarily  the  same  for  the  requests  of  all  processors.  This  complicates  the  calculation  of  the 
diroughput.  It  is  probably  best  to  consider  die  mean  waiting  time  per  request  from  processor  i.  for 
all  /,  rather  than  die  mean  waiting  time  for  any  request  given  by  lw. 

Note  that  while  the  case  with  non-identical  processors  is  straightforward  to  handle,  die  state 
space  required  and  die  complexity  of  the  analysis  increases  without  necessarily  contributing  much 
insight. 

2.10.2  More  Complex  Processor  Models 

This  ease  can  again  be  handled  by  increasing  the  number  of  states  representing  the  Multibus 
model.  We  assume  in  this  subsection  that  die  processors  arc  identical,  independent,  and  stationary. 
Ihcsc  assumptions  can  be  relaxed  by  the  methods  discussed  in  the  preceding  and  succeeding  sub¬ 
sections. 
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Within  the  assumptions  slated  above,  we  can  make  the  processor  model  arbitrarily  complex 
and.  provided  that  we  can  find  a  Markovian  stale  description  of  the  processor,  we  can  augment 
the  slate  of  the  Multibus  model  with  the  state  of  each  processor  model  and  in  principle  solve  for 
the  steady-state  probability  distribution.  Once  we  know  the  steady-state  probability  distribution  we 
can  in  principle  determine  any  related  performance  measurement  of  interest.  Ihc  difficulty,  of 
course,  is  with  the  "in  principle”  part 

One  quite  general  way  to  proceed  is  to  approximate  the  entire  Multibus  model  (including  the 
processor  models)  by  a  queueing  network  model  with  a  product  form  solution.  One  advantage  of 
this  approach  is  that  we  can  deal  with  the  model  at  a  more  abstract  level.  Ilic  states  need  not  be 
Markovian:  it  suffices  that  each  queue  is  quasi-rcvcrsiblc  in  isolation  with  respect  to  some  Marko¬ 
vian  stale  description  but  we  need  not  find  or  deal  with  such  a  description.  A  second  advantage  is 
that  we  can  obtain  analytical  expressions  for  the  steady-state  distributions  and  hence  for  the  per¬ 
formance  measures  of  interest.  A  disadvantage  is  that  inevitably  some  simplifying  assumptions  arc 
involved.  In  some  eases  die  necessary  simplifying  assumptions  may  obscure  or  eliminate  the 
features  of  interest.  In  such  eases  one  must  resort  to  other  mcUiods  such  as  simulation.  (Ilicre  is  a 
paucity  of  mcUiods  for  dealing  with  large  non-product  form  systems.) 

A  way  to  extend  the  processor  model  using  a  queueing  network  model  is  to  consider  the 
processor  operation  as  consisting  of  a  set  of  activities,  say  A  |.  A  2.  •  • .  Am.  One  activity  might 
correspond  to  program  execution  in  die  processor's  local  memory,  another  might  correspond  to 
reading  or  writing  global  data,  and  yet  another  might  correspond  to  busy  wailing  on  some  global 
mcmoiy  location,  and  so  on.  (Of  course,  with  our  independence  assumption,  die  period  of  time 
spent  busy  wailing  must  be  independent  of  the  operation  of  the  other  processors.)  Associated  with 
each  activity  is  some  intcrarriva!  time  of  requests  for  the  Multibus,  some  intcrarrival  time  of 
requests  for  the  Ringbus.  a  probability  distribution  for  the  time  spent  in  that  activity,  and  a  proba¬ 
bility  distribution  for  the  next  activity  (which  may  depend  on  the  previous  activities  and  die  time 
in  Ciich).  We  can  describe  the  overall  Multibus  model  by  a  queueing  network  by  regarding  the 
activities  as  queues  (scvcial  queues  may  he  necessary  to  describe  each  activity)  and  the  operation 
of  die  processors  as  customers  which  move  from  queue  to  queue,  flic  customers  can  belong  to 
classes  which  represent  the  previous  qucuc(s)  visited,  the  service  lime  at  a  queue,  and  so  on  (pro¬ 
vided  each  queue  remains  quasi-rcvcrsiblc  with  respect  to  the  classes),  f  inally,  the  transition  from 
one  class  to  another  can  be  governed  by  a  probability  distribution  depending  only  on  the  present 
class. 
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Prorcstinr  1 


Processor  3 


figure  2.22:  Queueing  network  model  with  processor  activities 

A  queueing  network  model  foi  a  tlircc  processor  sjstcm  with  three  activities  each  is  depicted  in 
l  'igurc  2.22. 

If  each  queue  is  quasi-rcvcrsiblc  in  isolation,  then  the  global  state  probability  has  a  product 
form  solution.  Since  there  is  at  most  one  customer  per  class  (we  assume  that  there  is  a  total  of  one 
customer  in  all  the  classes  associated  with  a  single  processor  -  more  than  one  would  correspond  to 
a  multi-tasking  processor),  the  service  time  distribution  at  each  queue  except  the  I  CI  S  Multibus 
queue  may  be  completely  general.  As  discussed  in  sections  2.8  and  2.9.  the  sercicc  time  distribu¬ 
tion  at  die  Multibus  queue  must  either  be  exponential  with  the  same  mean  for  all  customers  or  the 
queue  discipline  must  be  server-sharing. 

To  illustrate  the  activity-based  queueing  network  model  more  concretely,  we  consider  the 
following  general  ease. 

I.ct  there  be  N,  not  necessarily  identical  processors.  I  ct  the  model  for  processor  i  consist  of 

Q(i)  queues  Q' \,  Q' 2.  •  •  • .  O' i>(n  an<*  l*1c  Multibus  queue  (which  is  common  to  all  A  processors 

models).  I.ct  there  be  a  finite  set  of  customer  classes  C(i)  associated  with  each  processor  /.  I  nch 

customer  class  visits  at  least  one  queue.  Upon  departing  from  a  queue,  a  customer  of  class  k  joins 

class  /  with  probability  (for  processor  i).  flic  classes  associated  with  processor  1  form  a  single 

closed  loop  including  all  O(i)  queues  and  the  Multibus  queue,  thus  ^  r‘ kI  I. 

k(  (  (/ )  /*('(,) 


v,  v.  ^w^jwt«w.^wr.Afljrw.«w\^TOvwwwvirfl  dv  wj*wwj* 
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Let  there  be  exactly  one  customer  in  the  closed  limp  of  classes  corresponding  to  each  proces¬ 
sor.  Thus  the  service  time  distribution  at  each  queue  except  the  Multibus  queue  may  be  com¬ 
pletely  general.  (In  addition,  any  customer  at  a  queue  other  than  the  Multibus  queue  must  be 
receiving  service.)  We  assume  that  the  service  lime  distribution  at  each  queue  is  independent  of 
customer  class.  (Lor  non-Multibus  queues  we  can  simply  add  more  queues  and  classes  to  circum¬ 
vent  this  restriction.)  We  also  assume  cidicr  that  the  Multibus  queue  discipline  is  L'CI-’S  and  the 
service  time  distribution  is  exponential  or  that  the  Multibus  queue  discipline  is  server-sharing  and 
the  service  lime  distribution  is  general. 

by  adding  a  sufficient  number  of  queues  and  classes,  the  general  case  just  described  can  han¬ 
dle  or  approximate  a  wide  range  of  activities  and  processor  models.  As  staled  earlier  in  this  sec¬ 
tion,  the  classes  can  represent  quite  detailed  history,  such  as  previous  queues  visited  and  the  ser¬ 
vice  times  at  those  queues.  Therefore  one  can  even  have  an  approximate  distribution  for  the  time 
spent  in  an  activity  by  defining  classes  to  represent  die  time  elapsed  in  a  certain  activity.  (This 
technique  will  be  discussed  in  more  detail  in  section  2.10..L)  by  construction,  each  queue  in  the 
general  case  just  described  is  quasi-revcrsiblc  in  isolation  and  thus  the  global  suae  probability  has 
a  product  form  solution.  We  now  investigate  this  solution. 

Let  the  global  state  be  X -Or/*, ,  ■  ■  •  ,jt/*  ,v)  where  */»  represents  the  state  of  processor  i  and 

i  N  i 

y  represents  the  state  of  the  Multibus  queue.  Then  we  have  the  product  form  solution: 


wx- 


l 


Let  xp  ~(g'  u'-'.q'n)  where  q' j  denotes  the  state  of  queue  j  for  processor  i  and  let 
q‘ j=(n‘ j(l)ji‘ j(k),  •  •  •  )  for  each  class  l,k,  •  •  ■  £C{i)  where  n'j(k)  denotes  the  number  of  cus¬ 
tomers  in  class  k  at  queue  j.  Let  A' 7  denote  the  effective  arrival  rate  of  class  j  customers  for  pro¬ 
cessor  i.  Conservation  of  flow  yields 
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Then  the  steady-state  probability  of  state  q'j  is 
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where  n 1 , 


2  ,l'  j(k),  p'jk  s' j\'  k ,  and  s'  j  is  the  mean  service  time  at  queue  j  for  processor 
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If  we  let  q1  j-~(n' j)  we  have 
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j  r  / 


where  p‘j~  2  p‘a  and  f  '('  v/)  is  the  set  of  classes  arriving  at  queue  y  for  processor  i.  Pius 
kCC(IJ) 

the  stcady-sUitc  probability  of  state  jt/»  is 

*  y  =  t  «y 

Let  -  ^.n lj.  Ibcn  if  we  let  x/>  -(n1)  we  have  (since  «'  -0  or  1) 

y«i 

w^=r'k 

where  p'  ^Vy-^ls'y  2  V* 


/=i  y=i 


*CC(/j) 


Let  the  suite  of  the  Multibus  queue  be  represented  by  j.'  -(wj.  •  •  •  .w/v)  where  /»,  denotes 
the  number  of  requests  in  the  queue  from  processor  /.  Note  that  »»,  +  n1  -L  IX'iiotc  die  effective 
arrival  rate  of  customers  at  die  Multibus  queue  from  processor  i  by  X1  ,\m  i.c. 
X1  MH-  2  where  CUM H)  is  the  set  of  all  classes  arriving  at  the  Multibus  from  pro- 
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Ibcrcforc  if  the  global  suite  is  X  -(« *,  •  •  •  •  •  •  ,/»/y)  we  have 
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where  =—  -  (and  n‘  =  1  -m,.  in,  -0  or  1).  Ibe  quantity  -  — - - is  die  ratio  for  processor 


2 

*  c<(/.y» 


MR 


i  of  die  effective  arrival  rate  at  queue  y  to  the  effective  arrival  rate  at  the  Multibus  queue. 
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Ihc  mean  wailing  time  per  request  from  processor  /.  iw,  and  the  mean  wailing  time  per 
request  for  any  request.  in .  can  he  derived  from  equation  2.22. 


Wo  note  the  following  two  points  about  equation  2.22: 

1.  liquation  2.22  is  dependent  on  the  details  of  tire  model  for  processor  i  only  through  the 

2  V* 

quantities  s' j  and - —  (for  (y-  I.  •  •  •  ,0(i))-  Ihc  former  quantity  is  given  and  the 

latter  can  be  computed  on  solving  the  conservation  flow  equations  2.21  (within  some  arbi¬ 
trary  constant).  Ihus  the  overall  solution  for  or  iw  effectively  reduces  to  solving  a  set  of 
linear  equations  (equation  2.21)  for  c.ich  of  die  N  processors.  Since  solving  large  sets  of 
such  equations  is  relatively  easy,  the  main  difficulty  with  applying  queueing  networks  to 
model  complex  processor  behaviour  is  specifying  die  desired  behaviour  in  terms  of  queues, 
service  time  distributions,  and  routing  probabilities. 

2.  Consider  the  model  in  section  2.4  with  exponential  processing  and  access  lime  distributions 
with  non-identical  processors.  If  we  let  the  global  suite  be  XCxp-  ('»i«  •  •  •  .»//v)  where  /«,■  is 
the  number  (0  or  1)  of  requests  from  processor  i  in  the  Multibus  queue,  then  the  steady- 
state  probability  of  Xcxp  is 
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(1.23) 


where  iPi  is  the  mean  processing  time  of  processor  /.  (liquation  2.23  follows  from  equation 
2.5  in  section  2.6.)  Kquations  2.22  and  2.23  arc  identical  if  iPi  is  replaced  by 


Therefore  the  most  complicated  stationary  model,  when 


expressed  as  a  queueing  network  model  as  described  in  the  general  ease  presented  earlier, 
lias  the  same  solution  for  tw  and  as  the  simple  exponential  prrxrcssing  and  access  time 
model  (with  appropriate  tp^)\  It  is  fascinating  that  the  single  parameter  lp^  suffices  in 
the  solution  of  an  (almost)  arbitrarily  complex  model.  Of  course,  the  underlying  reason  for 
this  result  is  the  exponential  access  time  or  server-sharing  discipline  of  the  Multibus  queue. 

A  possibility  to  circumvent  the  difficulty  mentioned  in  point  1  is  now  apparent.  Simulate  or 
actually  run  a  single  processor  with  the  desired  complex  behaviour  on  a  system  with  exponentially 
distributed  access  timer,  (perhaps  simulation  is  best  to  achieve  such  access  times).  Measure  the 
steady-slate  probability  distribution  and  solve  for  the  \p  which  yields  this  same  probability 
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distribution  with  exponentially  distributed  processing  time.  Once  the  value  of  Tp  c^  has  been 
determined  in  such  a  manner  for  each  different  processor,  and  lw  can  be  computed  from  equa¬ 
tion  2.23.  'ITiis  indirect  approach  for  determining  7Wf  and  Jw  may  be  cheaper  for  a  large  number  of 
processors  N  than  the  obvious  alternative  of  simulating  the  entire  system  since  N  simulation  runs 
of  a  single  processor  may  be  cheaper  than  one  simulation  run  of  N  processors  (for  die  same 
degree  of  accuracy). 

Hquation  2.22  is  a  fine  result  if  the  performance  measures  of  interest  involve  just  the  status 
of  the  Multibus  queue  and  do  not  involve  the  status  of  any  processors.  If  die  measures  of  interest 
involve  both  the  Multibus  and  the  processors,  then  we  cannot  simplify  the  solution  of  the  queue¬ 
ing  network  model  to  such  a  degree.  This  unfortunately  means  that  die  state  space  may  remain 
large,  binding  die  solution  of  queueing  networks  with  a  large  number  of  suites  is  computationally 
expensive,  Efficient  techniques  for  handling  such  cases  have  been  developed  by  Ruzen  [1)3], 
Cliandy,  Herzog,  and  Woo  |C2],  Reiser  and  Kobayashi  [R2],  Reiser  and  Sauer  [R3],  Chandy  and 
Sauer  [C3],  l.am  [1.1],  and  Ixim  and  l.icn  [1.2].  However,  even  diese  techniques  require  a  lot  of 
work  when  die  suite  space  is  as  enormous  as  it  might  easily  get  with  complex  models. 

Another  approach  when  die  queueing  network  remains  large  after  simplification  or  when 
product  form  queueing  network  models  arc  not  applicable,  is  to  decompose  the  overall  model  into 
more  manageable  submodels,  each  of  which  can  be  studied  and  solved  independently,  and 
integrate  die  submodel  results  to  obtain  an  overall  solution.  Kxccpt  in  special  circumstances,  such 
a  procedure  yields  only  approximate  results  and  thus  several  iterations  of  decomposition  and 
integration  may  be  required  to  obtain  results  of  sufficient  accuracy. 

2.10.3  'l  iinc  Dependent  Hchaviour 

This  subsection  is  directed  chiefly  towards  time  dependent  behaviour  of  the  processing  time 
distribution.  We  regard  the  access  time  distribution  as  mainly  fixed  by  the  hardware  and  thus  time 
invariant.  However,  die  probabilities  of  the  different  type  of  accesses  -  word  vs.  long  word  and 
Multibus  vs.  Ringbus  -  may  well  be  lime  dependent.  If  these  probabilities  arc  time  dependent  they 
can  be  treated  in  die  same  manner  as  die  processing  time  distribution. 

We  limit  our  discussion  to  processor  behaviours  that  can  be  reasonably  well  approximated  as 
time  independent  -  i.c.  stationary  -  on  a  finite  number  of  nonzero  time  intervals.  The  idea  is  to 
«  represent  each  stationary  interval  of  this  piece-wise  stationary  approximation  of  the  processor 

behaviour  by  a  stationary  submodel.  The  overall  processor  model  dicn  consists  of  a  finite  set  of 
such  submodels,  with  exactly  one  such  submodel  in  effect  at  each  point  in  time;  the  duration  each 
submodel  remains  in  effect;  and  some  strategy  to  choose  the  next  submodel  when  the  time  allot- 


V 
/✓ 


ted  to  the  present  submodel  is  expended.  Ivach  stationary  submodel  can  be  arbitrarily  complex  - 
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such  as  those  models  discussed  in  sections  2.10.1  and  2.10.2  -  as  long  as  it  is  stationary  and 
independent  of  all  other  submodels. 

We  distinguish  two  cases  based  on  the  lime  required  for  a  submodel  to  appro;ich  steady-state 
(i.c.  for  die  transients  to  die  out)  relative  to  die  duration  of  the  submodel. 

Case  I:  Tor  every  submodel,  the  time  required  to  approach  steady-state  is  small  relative  to  the 
duration  of  the  submodel.  (We  will  not  discuss  what  is  "short"  enough.)  In  this  ease  it  may  be  rea¬ 
sonable  to  approximate  die  behaviour  of  each  submodel  over  its  entire  duration  by  its  steady-state 
behaviour.  Hie  behaviour  of  die  overall  model  can  dien  be  approximated  as  a  piece-wise  function 
of  die  steady-state  behaviour  on  each  submodel  interval.  In  this  ease  it  is  probably  best  to 
represent  any  performance  measure  of  interest  for  the  overall  model  by  a  vector  of  such  perfor¬ 
mance  measures  with  each  element  of  the  vector  corresponding  to  a  different  submodel. 
Knowledge  of  the  duration  of  each  submodel  and  the  strategy  for  choosing  submodels  allows  the 
average  of  any  performance  measure  to  be  determined  from  its  performance  measure  "vector"  on 
all  the  submodels.  Note,  however,  that  such  an  average  performance  may  not  be  too  meaningful; 
at  die  least,  it  must  be  carefully  interpreted.  Note  also  dial  the  steady-state  behaviour  of  the  other 
submodels  can  be  determined  simply  by  assuming  it  is  the  only  submodel.  Thus  this  case  has  the 
important  attribute  that  the  overall  model  can  be  decomposed  into  a,  number  of  smaller  and 
independent  submodels. 

Case  2:  l  or  at  least  one  submodel,  die  time  required  to  approach  steady-state  is  not  small  relative 
to  the  duration  of  the  submodel.  This  ease  is  more  difficult  since  die  dynamics  of  the  overall 
model  preclude  its  treatment  as  independent  submodels.  (  There  arc  certainly  situations  in  which 
some  but  not  all  of  die  submodels  can  be  treated  as  independent  and  approximated  by  their 
steady-state  behaviour  over  their  entire  duration.  Perhaps  such  hybrid  situations  should  be  called 
Case  3.)  To  handle  Case  2  we  need  to  incorporate  in  the  suite  description  somehow  the  expended 
time  (or  remaining  time)  in  the  duration  of  the  submodel  in  effect  and  die  submodel  (and  perhaps 
some  past  history  of  submodel  choices  too).  Of  course  we  can  specify  a  Markov  process  which 
incorporates  this  additional  information*  but  we  again  return  to  the  more  abstract  queueing  net¬ 
work  models.  In  fact  we  return  to  the  activity  based  queueing  network  model  discussed  in  the 
previous  subsection. 

We  consider  each  submodel  to  be  an  activity  with  some  probability  distribution,  l'j(t),  for 
the  time  in  that  activity  and  some  probability  distribution  for  die  next  activity  to  enter  given  the 

ll  will  almost  certainly  tin n  (Hit  that  il  is  too  difficult  to  treat  such  Maikov  processes  analytically  except  in 
trivial  eases. 
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current  activity.  (Iliis  latter  probability  can  be  generalized  to  depend  on  past  activities.)  We 
represent  the  set  of  activities  as  a  queueing  network  mode!  as  follows. 

I.et  there  be  one  queue  per  activity.  The  service  time  at  this  queue  has  the  same  (stationary) 
distribution  as  the  processing  time  of  that  activity.  (We  consider  a  situation  in  which  the  processing 
time  distribution  of  the  overall  model  is  not  stationary.  We  do  not  consider  any  other  complica¬ 
tions  here  on  the  basic  queueing  model  discussed  in  section  2.1.)  Let  the  classes  associated  with  the 
queue  represent  the  total  amount  of  processing  time  elapsed  so  far  while  in  that  activity.  Specifi¬ 
cally.  let  there  be  classes 

1.  c\i,nAl)  representing  a  request  for  the  Multibus  from  activity  /'  where  the  cumulative  pro¬ 
cessing  time  while  in  activity  i  is  /€[//A/.(/i  v  1  )A/ ).  and 

2.  c\i.tiAt)  representing  a  request  returning  from  Multibus  service  to  activity  /  with  cumulative 
processing  time  while  in  activity  i  of  /£[«A/.(«  y  |)A /).  (We  quantize  time  so  we  can  deal 
with  discrete  probabilities  for  the  time  being.  We 'have  chosen  quanta  of  uniform  size  for  sim¬ 
plicity  in  the  presentation.) 

Ilic  routing  probabilities  at  queue  /  (i.c.  the  queue  associated  with  activity  i)  arc: 

(m  -  n  k\i 

f  fp(s)ds,  )fw>//  and  j~-i 
(m  -a-  l)A  I 

/>  (c  \  i  Ji  A t  )\c  \  j.in  A/ ))  - 

0,  otherwise 

where  fpU)  is  the  piobability  density  function  (pdf)  of  the  processing  lime  at  queue  i.  The  rout¬ 
ing  probabilities  at  the  Multibus  queue  arc: 

e^iihtypij  if/  =  0and  j*i 
fj(c\i,nAt);cHjJ))~  1  -e'(nAt)  if  /  ~«A/  and  j=i 
0  otherwise 

Pij  is  the  probability  that  the  next  activity  is  j  given  that  the  current  activity  is  i  (2/,,/-l)- 

)*i 

c'(nAt)  is  the  probability  dial  activity  /  ends  in  |«A/,(w  +  1)A /)  given  that  the  sum  of  the  pro¬ 
cessing  times  incurred  while  in  this  activity  is  >//  At,  i.c. 
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(*  4  l>Ar 

I'nMntLt  - ,  />rw6(</y<«A/)<l 


f  fdi(s)ds 


*'(/iA/)- 


/Vo6(</,</iA/)=1 


where  (/)  is  the  pdf  of  the  duration  in  activity  /. 

We  now  have  a  queueing  network  model  of  the  form  discussed  in  the  previous  subsection, 
l-rom  equation  2.23  we  know  that  die  steady-state  probability  distribution  of  customers  in  the 
Multibus  queue  (from  which  we  can  determine  lw  and  Tw)  depends  on  the  mean  processing  time 

while  in  each  activity  and  the  ratio  of  the  effective  arrival  rate  at  each  queue  to  the  effective 
arrival  rate  at  the  Multibus. 

Denote  the  effective  arrival  rate  of  class  c^/./iA/)  customers  by  Mc'li.n A/)).  Then  the 


2  v 


ratio  imJL 


of  equation  2.22  is  given  by 


2  A(c2(/.#iA/)) 

n  =0 

__ 

2  2  Me '(/./!  A/)) 

i  n  =Q 

The  conservation  of  flow  equations  arc 

j*i 


I  (m-ri)b, 

A(c'(/,mAO)-  2  /  /#,i(j)JjX(c2(/.//A/)). 

n  =0  (m  -n  -  I )Ar 

Manipulating  these  equations  we  have 

OO  00 

2  X(c'(/,»iA/)) -  2  A(c2(/.W|A0) 

m -0  m  — 0 


(2.24) 


S 


•,.v  • 


Lk  ■  '  O] 


and 


Multibus  Models 


127 


m-l  <«  -»*' 

X(c,(t>iA/))-  2  /  y^(iV/s(l  cl(Htbt))\(c\j.mAi))  /  2I/V/4 '(«A/)X(< '(/./lAf)) 

h  -  0  (m  -*  -  1)4/  '  J*i 

(2.26)t 

The  ratio  in  equation  2.24  is  determined  by  the  solution  of  equation  2.26  and  the  identity 
2.25.  'ITic  system  of  linear  equations  2.26  can  be  solved  for  X(c  '(/.mA/))  within  an  arbitrary  con¬ 
stant.  ITicrcforc.  as  in  die  previous  subsection,  the  overall  solution  for  7W  and  lw  effectively 
reduces  to  solving  a  set  of  linear  equations  for  each  of  the  N  processors. 

It  is  highly  desirable  to  keep  die  number  of  time  quanta  fairly  small  so  that  die  number  of 
equations  to  solve  in  2.26  is  not  enormous.  The  degree  of  inaccuracy  introduced  in  die  solution  by 
the  quantization  can  be  estimated  by  comparing  the  solution  with  dial  obtained  with  a  larger 
number  of  quanta. 

Finally,  diis  treatment  of  nonstationary  processor  behaviour  can  be  extended,  along  die  lines 
of  the  previous  subsection,  to  deal  with  more  complex  processor  behaviour. 

2.10.4  Dependent  Processors 

By  dependent  processors  we  mean  diat  for  at  least  one  processor  i  d.cre  exists  some  time  / 
such  that  die  operation  of  the  processor  is  not  statistically  independent  of  die  operation  of  proces¬ 
sor  j  for  all  j*i  and  for  all  time  s<t.  To  model  dependent  processors,  the  state  of  a  processor 
must  be  allowed  to  depend  on  the  state  of  other  processors.  This  dependency  unfortunately  pre¬ 
cludes  the  use  of  queueing  network  models  with  product  form  solutions  as  we  have  pursued  to 
this  point  in  this  thesis.  The  reasoning  is  as  follows. 

In  a  queueing  network  model,  die  slate  of  a  processor  is  given  by  die  concatenation  of  the 
states  of  all  queues  representing  that  processor.  Alternatively,  we  can  view  the  state  of  a  processor 
as  given  by  the  class  in  which  the  one  customer  is  in.  (  There  can  only  be  one  customer  per  pro¬ 
cessor  since  we  arc  modeling  the  Multibus  at  the  memory  access  level  and  processor  are  single 
tasking  -  i.c.  a  processor  is  idle  while  it  has  a  Multibus  memory  access  pending  or  in  progress.) 
Thus  if  the  suites  of  two  processors  arc  dependent,  then  some  of  the  respective  classes  of  die  pro¬ 
cessors  arc  dependent  -  i.c.  the  present  class  of  die  customer  for  one  processor  may  determine  die 
present  or  future  class  of  die  customer  for  another  processor.  But  a  product  form  solution  is  not 
guaranteed  if  the  class  of  one  customer  depends  on  the  class  of  another  since  the  routing  of  custo¬ 
mers  is  now  effectively  dependent  on  the  suite  of  die  queueing  network.  (Walrand's  proof  [W1J 

^  Upon  taking  Ihc  limit  A/—*0.  equation  2  2b  becomes  a  sol  of  Voltcrra  integral  equations  of  the  second  kind 
(H3| 
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of  the  product  form  for  networks  of  quasi-rcvcrsiblc  queues  requires  that  the  routing  he  indepen¬ 
dent  of  everything  else.)  l-'or  two  processors  we  can  attempt  to  circumvent  the  difficulty  imposed 
by  dependent  classes  by  introducing  "superclasses”  to  represent  all  possible  pairs  (C,X'j)  where  C, 
denotes  a  customer  class  for  processor  i.  (This  can  be  gcncralt/.ed  for  more  than  two  processors.) 
A  change  in  class  at  processor  /  then  forces  a  change  in  the  superclass  which  also  forces  a  change 
in  class  at  process  j.  A  product  form  solution  can  be  developed  with  respect  to  the  superclasses. 
However,  a  customer  in  a  superclass  can  only  have  one  service  time  distribution  at  each  queue. 
Yet  a  customer  in  a  superclass  represents  two  customers  of  classes  C)  and  C;  respectively  from 
different  processors  with  possibly  vastly  different  service  times  at  queues.  Therefore  a  queueing 
network  model  with  superclasses  is  not  representative  of  the  original  queueing  network  model 
unless  the  classes  ()  and  C}  corresponding  to  each  superclass  have  die  same  service  time  distribu¬ 
tion  at  each  queue  for  the  two  different  processors.  And  if  Uiis  is  die  ease,  the  processors  are  not 
dependent.  Ihus  we  cannot  guarantee  that  a  queueing  network  model  for  dependent  processors 
possesses  a  product  form  solution  and  represents  the  operation  of  the  processors. 

Ihc  above  reasoning  implies  that  we  cannot  model  synchronization  and  mutual  exclusion, 
two  principal  forms  of  dependency  between  processors,  with  queueing  networks  and  expect  pro¬ 
duct  form  solutions.  In  addition,  it  is  well  known  (SI]  dial  product  form  solutions  cannot  be 
expected  for  queueing  network  models  involving  multiple  resource  possession.  Multiple  resource 
possession  occurs  when  a  customer  at  one  queue  requires  simultaneous  service  at  several  queues, 
thus  "possessing"  the  scrvkc  resources  of  those  queues.  An  example  of  multiple  resource  posses¬ 
sion  in  Concert  is  a  Ringbus  memory  access.  Such  an  access  requires  the  simultaneous  possession 
of  die  Multibus  and  Ringbus.  Ilius  a  product  form  solution  cannot  be  expected  if  we  model  Con¬ 
cert  as  a  queueing  network  model  with  a  queue  for  the  Multibus  and  a  queue  for  die  Ringbus. 
’litis  is  one  reason  why  we  have  chosen  to  decompose  Concert  into  separate  Multibus  and  Ringbus 
models  and  regarded  Ringbus  memory  accesses  as  just  requiring  a  different  service  time  at  the 
Multibus  queue. 

All  the  dependencies  mentioned  above  can  be  handled  with  sufficiently  detailed  Markov 
chain  models.  However  such  models  suffer  from  a  relatively  low  level  of  abstraction:  the  structure 
of  the  model  is  often  obscured  and  one's  energy  misdirected  by  the  details  of  Markov  state  defini¬ 
tions  and  transitions.  Stochastic  Petri  Nets  (SPNs)  [M2]  allow  modeling  at  a  higher  level  of 
abstraction  dian  with  Markov  chains  and  can  easily  handle  the  sort  of  dependencies  mentioned 
above.  A  SPN  model  is  less  complex,  easier  to  construct,  and  has  a  greater  likelihood  of  being 
correct  than  an  equivalent  Markov  chain  model. 

A  Petri  Net  is  a  set  P  of  places,  a  set  I  of  transitions,  a  set  a  of  directed  arcs  from  places  to 
transitions,  a  set  p  of  directed  arcs  from  transitions  to  places,  and  some  initial  placement  of  tokens 
in  places  (called  a  marking).  Arcs  incident  oil  a  given  transition  arc  called  input  . arcs  and  the  places 
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from  which  these  arcs  emanate  arc  called  input  places  for  that  transition.  Arcs  emanating  from  a 
given  transition  arc  called  output  arcs  and  the  places  at  which  these  arcs  terminate  arc  called  out¬ 
put  places  for  that  transition.  A  transition  is  enabled  when  there  is  at  least  one  token  in  each  of  its 
input  places.  After  a  transition  is  enabled,  it  fires  immediately,  removing  one  token  from  each  of 
its  input  places  and  adding  one  token  to  each  of  its  output  places.  (  There  can  be  more  than  one 
token  at  a  place.)  A  simple  Petri  Net  is  illustrated  in  Figure  2.23.  The  circles  represent  places,  the 
bars  represent  transitions,  and  the  dots  represent  tokens.  See  Peterson  [P2]  for  an  extensive  discus¬ 
sion  of  Petri  Nets  and  their  properties. 


Figure  2.23:  A  simple  Petri  net 


! 


► 


! 


! 


A  Stochastic  Petri  Net  (SPN)  is  a  Petri  Net  with  the  following  modification.  Associated  with  each 
transition  is  a  random  variable  which  specifies  the  interval,  called  the  firing  time,  between  the  ena¬ 
bling  of  that  transition  and  its  firing  (given  that  the  transition  is  still  enabled  at  that  time).  At  the 
instant  at  which  a  transition  fires  -  and  not  before  -  one  token  is  removed  from  each  of  its  input 
places  and  one  token  is  added  to  each  of  its  output  places.  'Thus  the  firing  of  one  transition  may 
cause  die  disabling  of  another  transition.  The  probability  distribution  of  the  firing  lime  is  given 
and  possibly  different  for  each  transition.  (Petri  Nets  can  also  be  made  stochastic  by  incorporating 
probabilistic  service  times  at  each  place.)  With  appropriate  probability  distributions  for  the  transi¬ 
tions  7*1,  7  2,  T 3,  and  T4,  Figure  2.23  represents  a  SPN  model  of  a  two  processor  Multibus  system. 
(Tj  represents  the  processing  time  of  processor  1.  r2  represents  the  access  time  of  processor  2,  T4 
represents  the  processing  time  of  processor  2,  and  7" 3  represents  the  access  lime  of  processor  2.) 
More  compicx  SPN  models  of  processors  can  be  developed  easily.  Performance  measures,  similar 
to  those  derived  with  our  other  modeling  techniques,  can  be  derived  from  a  SPN.  Molloy  [M2]  has 
shown  that  SPNs  with  exponential  firing  time  distributions  arc  isomorphic  to  one  dimensional 
Markov  chains  and  thus  the  performance  measures  of  interest  for  such  SPNs  can  be  determined 
by  their  equivalent  Markov  chains.  However,  with  Molloy  *s  technique  relatively  small  SPNs  result 
in  large  Markov  chains.  Such  state  space  explosion  makes  Molloy 's  technique  unattractive  for 
determining  the  performance  measures  of  larger  SPNs.  Wiley  [W3]  has  developed  techniques  that 
arc  more  efficient  and  more  general. 
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2.11  Conclusions 

1.  I  lie  general  behaviour  of  die  mean  wailing  lime  per  request  is  similar  to  that  depicted 
in  Figure  2.3:  the  position  of  the  knee  and  the  asymptotic  slope  depend  on  the  mean  process¬ 
ing  time;  die  mean  Multibus  access  time:  tr.  the  mean  recovery  time;  die  mean 

Ringbus  access  time;  /?,  die  probability  of  a  long  word  access;  and  </,  die  probability  of  a  Kinghus 
access.  The  exact  shape  of  the  waiting  time  per  request  versus  number  of  processors  curve 
depends  on  the  probability  distributions  for  the  processing,  recovery,  and  access  times.  Generally, 
the  more  "deterministic"  these  distributions  arc  -  i.c.  the  smaller  the  variance  of  die  associated 
random  variables  *  the  shallower  die  knee  is.  In  fact,  the  mean  waiting  time  per  request  with 
deterministic  processing,  recovery,  and  access  times  provides  a  lower  bound  on  the  mean  waiting 
time  per  request. 

?.  The  mean  waiting  time  per  request  can  be  more  sensitive  to  die  parameters  fi  and 
than  to  the  probability  distributions  for  the  processing,  recovery,  and  access  times.  In  the  eases  that 
we  simulated  (in  section  2.8. 1.4),  we  found  dial  the  mean  waiting  time  w  ith  various  probability  dis¬ 
tributions  was  fairly  close  to  that  obtained  with  exponential  probability  distributions.  This  suggests 
that  future  effort  be  spent  determining  appropriate  values  or  ranges  of  values  for  the  parameters 
P  and  ^  and  assessing  die  adequacy  of  our  simple  processor  model. 

3.  I  he  assumptions  of  identical  proccssois  and  a  simple  processor  model  can  be  removed, 
as  discussed  in  section  2.10,  by  expanding  our  basic  queueing  network  approach.  The  assumption 
of  time  independent  behaviour  can  also  be  removed,  provided  the  time  dependent  behaviour  can 
be  reasonably  approximated  by  time  piece-wise  independent  behaviour,  by  expanding  the  queue¬ 
ing  network  approach.  This  approach  is  trivial  in  the  special  ease  when  die  overall  model  can  be 
decomposed  into  independent  submodels  for  each  time  scale.  Otherwise,  this  approach  is  very 
complicated  and  probably  unreasonably  difficult  for  all  but  simple  models.  ITic  assumption  of 
independent  processors  is  die  most  difficult  to  remove.  In  fact,  it  cannot  be  removed  by  any 
expansion  of  our  queueing  network  approach  (unless  one  is  willing  to  sacrifice  tractability  and 
consider  networks  without  a  product  form  solution).  As  discussed  in  section  2.10.4.  the  behaviour 
of  the  Multibus  with  dependent  processors  can  be  modeled  with  low'  level  Markov  chain  models, 
or  more  preferably,  by  higher  level  models  such  as  Stochastic  Petri  Nets. 

4.  The  performance  of  die  Multibus  can  be  improved  by  die  following: 

i)  reduce  the  frequency  of  long  word  and  Ringbus  accesses.  Ringbus  accesses  arc  especially 
detrimental  to  performance  because  of  their  extremely  long  duration,  during  which  all  Mul¬ 
tibus  traffic  is  blocked.  In  the  actual  Concert  system,  the  minimum  duration  of  a  Ringbus 
access  is  2.00/iscc  and  the  maximum  duration  is  7X( l0  X0.200pscc)  (the  maximum  duration 
for  which  the  required  segments  can  be  allocated  to  other  requests)  +  2.70pscc  - 
16.70fisec  (assuming  no  tc.d  and  set  instructions).  Most  Ringbus  accesses  will  have  a 
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duration  somewhere  between  these  two  extremes,  depending  on  the  processing  time,  ,  and 

I  wo  ways  to  avoid  blocking  Multibus  traffic  when  a  Ringbus  access  occurs  arc  to: 

a)  replace  tltc  Multibus  by  two  or  more  parallel  buses,  or  perhaps  just  add  a  private  bus 
for  Ringbus  accesses,  and 

b)  divide  die  memory  transaction  protocol  into  a  memory  operation  component  and  an 
acknowledgment  component  that  occur  at  separate  times  between  which  control  of  the 
Multibus  may  be  relinquished  to  other  memory  transactions. 

Noth  of  these  options  arc  costly,  although  (a)  is  probably  less  costly. 

ii)  decrease  the  overhead  lime  on  non-local  memory  accesses.  Kach  non-local  memory  access 
experiences  100  to  200nsec  of  delay  due  to  die  Multibus  arbiter  and  substantial  delays  in 
asserting  the  IIRI'Q*  (Multibus  request)  signal  upon  detecting  a  non-local  memory  access 
and  in  asserting  the  address  and  control  signals  once  die  HORN*  (Multibus  grant)  signal  is 
asserted. 

iii)  reduce  the  Ringbus  access  time. 

2.12  Future  Work  Required 

1.  Kvaluatc  the  single  processor  equivalent  model  and  the  Multibus  models.  IXrivc 
appropriate  values  for  die  processor  model  parameters  from  real  programs  and  compare  the  per¬ 
formance  predicted  by  the  Multibus  models  with  die  actual  performance. 

Ali  [A1J  has  performed  some  work  in  diis  direction.  He  found  excellent  agreement  between 
predicted  and  actual  performance  of  die  simple  Multibus  (no  long  word  or  Ringbus  accesses)  for 
some  artificial  programs  emulating  the  simple  processor  model.  For  the  "real"  programs  which  Ali 
considered,  he  found  time  dependent  behaviour  to  be  very  important,  suggesting  that  stationary 
models  arc  inadequate. 

2.  Improve  die  processor  and  Multibus  models  and  develop  new  ones. 

Ihe  existing  models  can  be  improved  to  some  degree  as  discussed  in  section  2.10.  However, 
a  better  direction  in  which  to  priced  is  to  develop  higher  level  models,  such  as  Stochastic  Petri 
net  models,  l  ime  and  processor  dependencies  arc  easier  to  model  at  higher  levels. 
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The  Ringbus  Model 


.1.1  Introduction 

In  this  diopter  wc  study  the  Ringbus  subsystem.  As  discussed  in  section  1.3.5.  we  replace 
each  Multibus  by  a  single  processor  equivalent  model.  We  assume  Lhat  each  Multibus,  and  tints 
each  single  processor  equivalent  model,  is  identical  in  all  respects.  We  also  assume  that  the 
Ringbus  is  symmetrical  with  respect  to  each  Multibus.  Wc  make  these  assumptions  so  lhat  we  can 
use  die  abundant  symmetry  that  they  imply  to  simplify  considerably  die  analysis  of  the  Ringbus 
and  the  integration  of  the  Multibus  and  Ringbus  models.  The  treatment  in  this  chapter  can  be 
extended  easily  formally  (although  not  so  easily  practically)  to  deal  with  situations  in  which  these 
assumptions  are  not  valid.  Wc  assume  an  exponential  distribution  for  the  processing  time  distribu¬ 
tion  of  each  single  processor  equivalent.  The  reason  for  this  is  again  to  case  analysis.  We  make  no 
assumptions  at  this  point  about  the  access  time  distribution:  indeed,  diis  distribution  is  one  of  die 
factors  for  study  in  diis  chapter. 

The  focus  of  this  chapter  is  the  optimum  performance  of  the  Ringbus.  There  arc  three  rea¬ 
sons  for  this  emphasis  on  the  optimum  performance.  I  'irst.  the  Ringbus  is  a  novel  interconnection 
scheme  which  has  not  been  studied  previously  (as  far  as  we  know),  nius.  knowing  the  optimum 
performance  of  the  Ringbus  satisfies  a  natural  curiosity.  Second,  die  theoretical  maximum 
improvement  in  performance  of  any  particular  Ringbus  design  (including  die  design  utilized  in 
Concert)  can  be  determined  from  the  optimum  performance  of  the  Ringbus.  This  theoretical  per¬ 
formance  improvement  is  useful  in  evaluating  Ringbus  designs.  Third,  knowledge  of  the  optimum 
performance  of  the  Ringbus  allows  the  Ringbus  to  be  compared  with  other  interconnection 
schemes  in  terms  of  the  optimum  performance.  Since  the  Ringbus  is  a  novel  interconnection,  the 
optimum  performance  of  die  Ringbus  is  important  in  establishing  the  merit  of  Ringluis-likc 
schemes  with  ('(her  interconnection  schemes. 
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To  avoid  gelling  overwhelmed  by  details  or  trapped  by  Ihe  small  and  relatively  unimportant 
differences  between  various  Ringbus  designs,  we  utke  an  abstract  view  of  die  Ringbus.  This 
abstract  view  is  as  follows.  The  Ringbus  and  Ringbus  arbiter  operate  synchronously  with  an  arbiter 
clock  of  period  c.  Requests  for  the  interconnection  of  source  and  destination  slices  arrive  from  the 
Multibuses  (or  in  this  case  the  single  equivalent  processor  models  of  the  Multibus)  asynchronously 
with  respect  to  the  arbiter  clock.  On  each  rising  clock  edge,  the  arbiter  examines  all  pending 
requests  and  then  instantaneously  decides  which  requests  should  be  granted  and  how  the  requests 
should  be  granted.  This  decision  is  implemented  immediately  so  that  there  is  zero  delay  from  die 
rising  edge  of  die  arbiter  clock  to  the  time  that  a  segment  allocated  to  a  request  is  used.  Once 
granted,  a  request  lasts  exactly  some  integral  number  of  arbiter  clock  periods.  We  assume,  without 
loss  of  generality,  dial  the  duration  of  a  grant  (which  is  what  we  call  a  granted  request)  is  encoded 
in  its  request  rather  dian  determined  by  die  number  of  clock  periods  before  the  request  is 
removed  (as  it  is  in  the  Concert  system).^  Requests  remain  pending  until  they  arc  eventually 
granted.  The  Ringbus  itself  we  consider  to  be  just  a  ring  of  bus  segments  under  the  control  of  a 
central  arbiter. 

flic  abstract  view  of  die  Ringbus  given  above  is  really  a  set  of  simplifying  assumptions.  We 
list  the  most  important  of  these  assumptions  below. 

J)  Wc  ignore  the  delays  of  the  R I li  circuitry,  including  the  delay  to  mitigate  mclastability  when 
latching  die  asynchronous  request  signals  from  the  Multibus. 

2)  Wc  assume  zero  arbitration  time  and  zero  delay  in  connecting  the  bus  segments  of  die 
Ringbus. 

3)  Wc  assume  grant  durations  of  an  integral  number  of  arbiter  clock  periods. 

4)  Wc  assume  that  die  minimum  time  between  the  termination  of  a  grant  of  some  slice  and  die 
next  nonnull  request  from  that  slice  is  zero. 

In  addition,  wc  assume  there  are  no  global  register  accesses. 

Wc  term  the  abstract  view  of  the  Ringbus  summarized  by  the  above  assumptions  the  isolated 
Ringbus  model.  In  section  3.9  wc  discuss  the  differences  between  the  environment  of  the  isolated 
Ringbus  model  (created  by  these  assumptions)  and  the  environment  of  the  Ringbus  in  die  actual 
Concert  system.  Wc  also  consider  the  effects  diesc  differences  have  on  die  performance  of  die 
Ringbus.  Ihe  Multibus-Ringbus  interaction,  which  is  simplified  by  assumptions  1  and  4  above,  is 

^  Since  llicre  r-iay  be  /ern  lime  between  the  termination  of  a  giant  and  the  next  nonmill  request  from  a  slice  in 
our  abstract  Ringbus  the  arbiter  eniuiol  unambiguously  diffeicntiale  between  a  continuing  giant  and  a  new 
nonnull  request  of  the  same  type  if  the  diiialion  of  a  request  is  determined  by  the  interval  until  the  request  is 
removed  In  the  Conceit  system  there  is  ..I  least  one  clink  period  of  dead  time  between  successive  noimull  re¬ 
quests  from  Ihe  same  slice  io  prevent  tins  ambiguity. 
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discussed  in  detail  in  section  3.9.1  and  in  section  3.3.2  of  Appendix  A  for  the  actual  Concert  sys¬ 
tem.  'Hie  Multibus  Kingbus  interaction  is  complicated,  detailed,  and  very  dependent  on  die  implc- 
mentation.  ITiis  is  the  reason  dial  we  simplified  die  interaction  in  our  abstract  view. 

We  interpret  the  Kingbus  in  a  broad  sense.  We  define  die  Kingbus  to  be  a  ring  of  indepen¬ 
dent  bus  segments  in  which  adjacent  bus  segments  may  be  connected  to  form  longer  buses.  Asso¬ 
ciated  with  each  bus  segment  interconnection  point  is  a  slice  which  is  connected  to  the  segments 
via  an  access  path.  All  Kingbus  accesses  originate  and  terminate  at  slices.  The  intcrconnecdon  of 
the  bus  segments  occurs  in  real  time  under  the  control  of  a  central  arbiter  in  response  to  requests 
originating  from  slices  for  paths  to  other  slices.  We  assume  dial  the  arbiter  operates  in  discrete 
time  (although  it  need  not  in  all  eases). 

Different  Kingbus  designs  are  distinguished  by  1)  the  number  of  bus  segments  (which  is 
equal  to  the  number  of  slices),  2)  die  access  padis  between  die  slices  and  bus  segments,  and  3)  the 
arbitration  algorithm.  In  diis  chapter  we  only  consider  Kingbus  designs  with  an  even  number  of 
slices.  In  addition,  we  only  consider  two  different  types  of  access  paths:  asymmetrical  and  sym¬ 
metrical.  The  Kingbus  design  utilized  in  Concert  has  asymmetrical  access  padis  (as  discussed  in 
section  1.2.2.)  [See  also  figure  3.1.J  Hereafter  we  call  this  particular  Kingbus  design  -  minus  the 
arbitration  algorithm  -  the  Asymmetric  Kingbus.  lTtcsc  asymmetrical  access  padis  impose  unneces¬ 
sary  performance  limitation.  As  discussed  in  section  1.2.2,  counterclockwise  accesses  on  the  Asym¬ 
metric  Kingbus  require  two  segments  in  addition  to  die  segments  between  die  source  and  destina¬ 
tion  slices.  Symmetrical  access  paths  remove  this  performance  limitation.  A  Symmetric  Kingbus  is  a 
Asymmetric  Kingbus  with  symmetrical  access  paths  instead  of  asymmetrical  access  paths,  figure 
3.1  illustrates  the  access  paths  of  the  Asymmetric  Kingbus  and  the  Symmetric  Kingbus.  We  define 
the  Concert  Kingbus  to  be  the  Kingbus  and  arbitration  algoridim  actually  used  in  the  Concert  sys¬ 
tem.  ITiat  is,  the  Concert  Kingbus  is  a  Asymmetric  Kingbus  with  a  rotating  priority  arbitration 
algorithm*  (as  discussed  in  section  1.2.3). 


t  Anderson  (A ?.]  actually  rails  ihis  arbilralion  algorithm  a  rotating  piiority.  full  arbitration  arhtlralion  algo'ilhm 
to  distinguish  il  from  others  he  considered  during  the  design  of  Concert  We  vs  ill  rail  n  simply  a  rotating  pr-rniy 
arbilralion  algorithm. 


Minibus  Model 


M6 


nHintcrckickwi.sc 
K  ingbus  segment 


clockwise 
Ringbus  segment 

Asymmetrical  access  paths  Symmetrical  access  paths 

f  igure  3.1:  Access  p.tilis  of  Asymmetric  Ringbus and  Symmetric  Ringbus 

As  slated  earlier,  our  chief  interest  is  the  optimum  performance  of  the  Ringbus.  Since  the 
Symmetric  Ringbus  is  a  superset  of  the  Asymmetric  Ringbus,  the  optimum  performance  of  die 
Symmetric  Ringbus  is  gtcaicr  than  or  equal  to  that  of  the  Asymmetric  Ringbus.  l  or  this  reason, 
we  concentrate  on  the  optimum  performance  of  the  Symmetric  Ringbus  in  this  chapter.  The  Sym¬ 
metric  Ringbus  is  also  easier  to  analyze  since  it  lias  more  symmetry.  In  the  course  of  determining 
the  optimum  performance  we  also  determine  die  optimal  arbitration  algorithm,  which  is  of  interest 
in  designing  good  sub-optimal  algorithms. 

We  briefly  consider  the  optimum  performance  of  the  Asymmetric  Ringbus  for  a  small 
number  of  slices.  In  addition,  we  determine  the  performance  of  die  Concert  Ringbus  and  the  per¬ 
formance  of  the  Symmetric  Ringbus  witli  the  rotating  priority  arbitration  algorithm.  A  trivial 
modification  to  the  arbiter  in  the  actual  Concert  system  (which  we  call  die  Concert  Ringbus 
arbiter)  allows  this  algorithm  to  operate  with  symmetrical  access  paths.  (The  additional  complexity 
and  circuitry  required  in  the  Rill  might  not  be  judged  as  trivial.)  The  problem  from  the  point  of 
view  of  die  arbiter  with  symmetrical  access  paths  is  dial  conflicts  may  now  occur  at  the  request 
destination  as  well  as  at  die  Ringbus  segments.  Thus  the  arbiter  must  arbitrate  die  destinations  as 
well  as  die  Ringbus  segments. 

To  include  diis  feature,  the  arbiter  just  needs  to  arbitrate  for  each  Ringbus  resource  -  seg¬ 
ment  or  destination  -  in  the  same  manner  in  which  the  arbitration  proceeded  for  the  segments  in 
the  Concert  Ringbus  arbiter  (see  section  1.2.3).  The  first  step  is  to  determine  all  die  Ringbus 
resources  required  for  each  request.  As  in  the  Concert  Ringbus  arbiter,  requests  would  be  granted 
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only  in  die  direction  requiring  the  smallest  number  of  segments,  with  tics  being  broken  in  prefer¬ 
ence  of  the  clock *ise  direction.  Finally,  a  request  would  be  granted  when  it  has  been  granted  all 
die  resources  that  it  requires. 

A  logic  diagram  for  this  new  arbiter  is  shown  in  Figure  3.2.  The  pari  count  has  doubled 
because  we  now  have  double  the  number  of  Ringbus  resources  to  arbitrate.  However,  die  size  of 
the  parts  required  is  the  same.  The  number  of  parts  is  proportional  to  the  number  of  resources 
and  the  size  of  the  parts  is  exponential  to  the  number  of  sources. 

Iliis  new  arbiter  design,  which  evidently  was  overlooked  during  die  design  of  the  Concert 
system,  would  result  in  superior  or  equivalent  performance  in  all  eases.  (It  certainly  cannot  result 
in  inferior  performance  since  its  functionality  is  a  superset  of  die  other's.) 

In  section  3.2  we  formulate  the  Ringbus  as  a  discrete  time  probabilistic  model.  Time  is 
quantized  into  discrete  intervals,  called  rounds,  which  arc  equal  to  and  synchronous  with  die 
arbiter  clock  period.  The  performance  metric  of  the  Ringbus  model  is  the  diroughput  in  terms  of 
the  average  number  of  grants  completed  per  round.  Hie  optimum  performance  of  the  Ringbus 
model  is  formulated  as  a  Markovian  decision  problem. 

In  sections  3.3  and  3.4  we  investigate  the  optimal  arbiter  for  a  Ringbus  of  four  slices.  Section 
3.3  covers  grant  durations  of  one  round  and  section  3.4  covers  grant  durations  greater  than  one 
round  for  two  special  eases.  These  special  eases  arc  deterministic  grant  dotations  and  geometrically 
distributed  grant  durations. 

In  section  3.5  we  investigate  die  optimal  arbiter  for  a  Ringbus  of  six  slices  and  develop  a 
number  of  bounds  on  the  optimum  diroughput. 

Sections  3.6  and  3.7  consider  die  Ringbus  with  eight  and  more  slices.  Since  the  computa¬ 
tional  requirements  for  diese  eases  exceeds  the  available  resources,  we  just  discuss  die  general 
characteristics  of  the  optimum  throughput  in  section  3.6  and  die  optimum  throughput  for  some 
special  eases  in  section  3.7. 

In  section  3.8,  we  compare  die  performance  of  the  optimum  arbiter  algorithms  and  the  rotat¬ 
ing  priority  arbiter  algoridim  for  die  Concert  and  Symmetric  Ringbuscs. 

Finally,  in  section  3.9  we  discuss  some  of  the  differences  between  our  abstract  Ringbus 
model  and  die  Ringbus  utilized  in  Concert.  We  consider  die  effect  that  these  differences  have  on 
performance.  The  last  part  of  diis  section  develops  the  hooks  for  the  integration  of  the  isolated 
Ringbus  model  with  the  Multibus  model. 
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3.2  Kingbus  Model  Formulation 

From  die  single  processor  equivalent  m(Hlel  of  the  Multibus,  we  know  that  if  a  Ringbus 
access  occurs  in  some  round  then  with  probability  pf,Hettv  its  destination  is  i  slices  around  the 
Kingbus  from  the  source  slice  (/'  -  -(S/2-  1),  •  •  • ,  —1.  1.  2,  •  •  •  .  S/2).  Negative  values  of  i 
indicate  die  counterclockwise  direction  around  die  Ringbus  and  positive  values  indicate  die  clock¬ 
wise  direction.  Note  dial  this  probability  distribution  of  requests  is  independent  of  die  source  slice. 
This  is  a  consequence  of  our  assumption  dial  all  Multibus  models,  and  hence  all  die  single  proces¬ 
sor  equivalent  model  of  the  Multibus,  arc  identical.  Since  we  assumed  an  exponential  distribution 
for  the  processing  time  distribution  of  die  single  processor  equivalent  model  of  the  Multibus,  the 
probability  that  die  next  request  at  a  slice  arrives  in  the  i,h  round  alter  die  end  of  the  previous 
grant  at  dial  slice  is  a  constant  independent  of  /'.  In  other  words,  the  number  or  rounds  between 
die  end  of  a  grant  and  the  next  request  at  that  same  slice  (i.e.  die  discretized  processing  time  of 
the  single  equivalent  processor  model)  is  a  geometric  .random  variable,  because  of  the  mcmorylcss 
property  of  a  geometric  random  variable,  we  can  exclude  from  the  slate  description  any  informa¬ 
tion  on  the  number  of  rounds  wailed  so  far  for  a  request  to  arrive  at  a  slice.  Thus  the  assumption 
of  an  exponential  distribution  for  the  processing  time  of  the  single  processor  equivalent  model 
simplifies  not  only  die  integration  of  die  Multibus  and  Ringbus  models  but  also  the  analysis  of  die 
Ringbus  model. 

In  each  round  the  arbiter  must  decide  which  subset  of  die  current  requests  to  grant  based  on 
past  and  present  information  only.  The  arbiter  is  thus  a  causal,  discrete  time  decision  maker.  Deci¬ 
sions  arc  subject  to  die  following  constraints: 

1.  All  segments  required  by  a  request  must  be  connected  as  required  before  or  at  die  same  time 
that  die  request  is  granted. 

2.  Kach  segment  is  used  for  no  more  than  one  grant  in  a  round. 

3.  All  segments  required  by  a  grant  remain  connected  and  allocated  for  the  exclusive  use  of  dint 
grant  for  the  entire  duration  of  the  grant. 

4.  Kvery  pending  request  eventually  gets  granted  i.e.  each  request  has  a  bounded  waiting  time. 
(This  requires  a  bounded  Ringbus  access  time  In  die  Concert  system  each  Ringbus  access 
represents  a  single  memory  transaction  -  read,  write,  or  read-mod ify- write  -  and  the  duration 
of  each  such  transaction  is  bounded  by  die  Ringbus  timeout  period.*) 

Without  loss  of  generality,  we  consider  the  segments  referred  to  in  Constraint  1  to  be  con¬ 
nected  at  the  time  that  a  request  is  granted.  This  is  in  fact  how  the  Ringbus  operates  in  Concert. 

f  If  the  addressed  memory  location  at  the  destination  Kill  does  not  respond  with  an  acknowledgment  wilhin  a 
given  ume  interval,  the  destination  iCIS  sends  a  signal  to  ihe  source  Kill  which  aborts  ihc  Riiictnis  access 
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We  in;ikc  the  following  three  simplifications  in  our  formulation  of  the  Ringbus  model: 

1.  We  exclude  from  the  stale  description  any  information  on  the  duration  that  a  request  wails 
before  being  granted,  l  itis  waiting  time  information  is  irrelevant  when 

i)  in  modeling  the  performance  of  the  only  non-optimal  arbiter  algorithm  considered  in  this 
chapter  -  the  rotating  priority  arbiter  algorithm,  and 

ii)  determining  the  optimum  performance  of  the  Ringbus  without  Constraint  4. 

The  waiting  lime  information  is  irrelevant  in  ease  i)  because  the  rotating  priority  algorithm 
does  not  utilize  this  information.  We  have  not  presented  sufficient  machinery  at  this  point  to 
show  that  the  request  waiting  time  information  is  irrelevant  in  case  ii).  In  fact,  we  we  have  not 
even  completed  the  formulation  of  the  Ringbus  model.  Therefore  we  relegate  a  praise  state¬ 
ment  and  proof  of  the  irrelevance  of  this  history  information,  which  we  call  Theorem  3.1.  to 
Appendix  If  and  encourage  the  reader  to  examine  this  theorem  after  completing  subsection 
3.2.1. 

1.  We  ignore  Constraint  4  when  pursuing  the  optimum  performance  of  the  Ringbus.  Our  rea¬ 
sons  arc  as  follows,  l-'irst.  by  ignoring  Constraint  4,  request  waiting  time  information  may  be 
excluded  fiom  the  shite  description  (as  justified  by  Theorem  3.1).  thus  permitting  the  analysis 
to  be  greatly  simplified.  Second.  ignoring  Constraint  4  removes  the  effect  of  die  maximum 
permissible  waiting  time  on  the  the  optimum  performance  so  that  the  optimum  performance 
obtained  is  the  inherent  optimum  performance  of  the  Ringbus  architecture.  If  the  maximum 
permissible  waiting  time  is  sufficiently  large.  Constraint  4  has  negligible  effect.  If  it  is  suffi¬ 
ciently  small  (such  as  equal  to  its  minimum  value  of  (.V  -  1)1)  where  S  is  the  number  of  slices 
and  /)  is  the  maximum  duration  of  an  access).  Constraint  4  has  an  enormous  effect  on  the 
performance.  In  fact,  with  a  maximum  permissible  waiting  time  of  (,Y  -  1  )l),  the  arbiter  algo¬ 
rithm  must  impose  some  sort  of  strict  priority  ordering  on  requests.  Third,  any  arbiter  algo¬ 
rithm  can  easily  be  modified  to  ensure  bounded  wailing  times.  Such  a  modification  may,  of 
course,  result  in  a  degradation  of  performance  dependent  on  the  maximum  permissible  wait¬ 
ing  time. 

Note  that  assuming  a  large  enough  maximum  permissible  waiting  time  is  essentially  equivalent 
to  ignoring  Constraint  4.  We  prefer  to  think  of  ignoring  Constraint  4  as  assuming  such  a  large 
enough  maximum  waiting  lime. 

1.  We  limit  the  duration  of  a  grant  to  have  one  of  the  following  two  simple  forms: 

i)  a  constant  duration  of  </  rounds  where  </  -  1,2,3,  or  4. 

ii)  a  geometric  probability  distribution  i.c.  the  duration  is  d  rounds  where  </  is  a  random 
variable  with  a  (memoryless)  geometric  distribution. 
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3.11  Markovian  Decision  1‘orimilution 

Let  the  state  of  the  Ringbus  (ignoring  request  waiting  times  as  discussed  previously)  at  the 
beginning  of  each  round  be  described  by 


•  •  •  >riA'<  •  ‘  •  •rS'<ls) 

where  r,  denotes  the  destination  of  the  request  at  slice  /  and  dt  indicates  the  duration  for  which 
die  request  has  been  granted  so  far. 

We  express  the  destination  of  a  request  as  the  number  of  slices  live  destination  slice  is 
around  the  Ringbus  relative  to  the  source  slice.  We  use  positive  numbers  to  indicate  the  clockwise 
direction  from  the  source  and  negative  numbers  to  indicate  the  counterclockwise  direction  from 
die  source.  Thus  r,  -  2  indicates  a  request  to  the  slice  two  slices  along  die  Ringbus  in  the  clockwise 
direction  from  the  source  slice,  and  r,  -  -2  indicates  a  request  to  die  slice  two  slices  along  the 
Ringbus  in  the  counterclockwise  direction  from  die  source  slice. 

We  do  not  use  r,-0  to  indicate  a  request  from  slice  /  to  slice  < .  We  assumed  earlier  that 
there  arc  no  global  register  accesses,  lienee  such  requests  do  not  occur.  Instead,  we  use  r,  0  to 
indicate  that  slice  i  is  not  requesting  a  Ringbus  destination.  We  call  this  absence  of  a  request  a  null 
request.  The  arbiter  treats  a  null  request  just  like  a  genuine  request  except  that  1)  a  null  request  is 
always  granted  immediately  when  it  occurs  (since  there  arc  no  resources  to  he  granted  for  a  null 
request)  and  2)  a  null  request  always  has  a  duration  of  only  one  round.  Any  two  consecutive 
genuine  requests  at  a  slice  are  separated  by  some  number  (possibly  zero)  of  null  requests  propor¬ 
tional  to  die  processing  time  between  those  genuine  requests. 

A  request  from  slice  /  is  pending  (i.c.  not  yet  granted)  if  and  only  if  <1,  ().  flic  duration  </,  is 

increased  by  one  for  each  round  that  the  request  remains  granted.  We  express  the  destination  of 
any  pending  request  in  terms  of  the  smallest  number  of  slices  -  either  clockwise  or  counterclock¬ 
wise  direction  -  the  destination  slice  is  relative  to  the  source  slice.  A  tic  in  die  number  of  slices  in 
each  direction  is  broken  in  favour  of  the  clockwise  direction.  Thus  for  any  pending  request  if  the 
source  slice  is  /  and  die  destination  slice  is  j*i.  then 


x 

x 


x<S/2 
.  .V/  2<x 


where  x  =(j  -i)  mod  S. 

A  request  may  be  granted  in  either  clockwise  or  counterclockwise  direction.  We  express  the 
destination  of  a  request  once  the  request  is  granted  in  terms  of  the  direction  in  which  the  request 
was  granted.  Thus  if  a  request  is  granted  from  slice  i  to  slice  j*i. 


n  - 


.»  .  if  granted  in  clockwise  direction 
x  .S'  .  if  gi. mted  in  counterclockwise  direction 
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where  x  -  (j  -  i)  mod  S. 

Therefore  once  a  request  is  granted,  we  use  r,  to  indicate  which  segments  have  been  allo¬ 
cated  to  that  request.  For  the  Symmetrical  Kingbus,  the  mapping  from  /,  to  tire  segments  is  espe¬ 
cially  easy:  |r,|  indicates  the  number  of  segments  allocated  beginning  from  slice  i  and  the  sign  of 
r,  indicates  the  direction  around  the  Kingbus  in  which  these  segments  are  allocated.  For  the 
Asymmetric  Kingbus,  the  mapping  is  the  same  except  that  two  additional  segments  arc  required 
for  requests  granted  in  the  counterclockwise  direction:  tire  segment  most  immediately  clockwise  of 
the  source  slice  and  the  segment  most  immediately  counterclockwise  of  the  destination  slice  /. 
(  Thus  there  is  only  one  direction  to  grant  requests  from  a  slice  to  its  immediate  clockwise  neigh¬ 
bour  i.c.  from  slice  i  to  slice  (/  nwd  S)  +  I.) 

An  example  of  the  definition  of  r,  is  illustrated  in  Figure  13. 


Symmetric  Ringbus 


Concert  Ringbus 


K 


requests  granted 


\ 

_ Two  additional 

segments  required 


r.  =  2-  r*  = 


Figure  3.3:  Fxamplcs  of  r/ 

In  some  eases  die  state  description  simplifies.  If  all  grants  have  a  constant  duration  of  one 
round,  then  all  the  </,  can  be  eliminated  from  the  state  description  since  a  new  request  -  either 
genuine  or  null  -  always  replaces  a  request  once  it  has  been  granted.  If  all  grants  of  genuine 
requests  have  a  duration  with  a  geometric  distribution,  then  we  only  need  a  binary  variable  for  </,. 
As  before,  d,  indicates  a  pending  request,  d,  1  indicates  that  a  request  has  been  granted  for  one 
or  more  rounds.  The  exact  duration  of  the  grant  in  this  ease  is  irrelevant  since  the  geometric  distri¬ 
bution  of  the  duration  is  mcmorylcss  (i.c.  independent  of  how  long  the  request  has  been  granted 
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so  far). 

When  a  grant  at  a  slice  terminates  (after  one  round  for  a  mill  request  and  one  or  more 
rounds  for  a  nonnull  request),  a  new  request  arrives  at  that  slice  at  the  beginning  of  die  following 
round.  We  denote  by  p,  the  probability  dial  a  new  request  is  for  a  destination  /  slices  along  die 
Ringbus  from  the  source  slice,  /  —  —  (.V/ 2  —  I ).  •••,  -1,  1.  2.  •••,  S/2.  As  before,  negative 
values  of  /  indicate  die  counterclockwise  direction  around  die  Ringbus  from  the  source  and  posi¬ 
tive  values  of  i  indicate  die  clockwise  direction.  We  denote  by  /?o  die  probability  dial  a  new 

..MHeqv 

J'j 

request  is  a  null  request.  Ilius  p,  =• - —  for  /*0. 

» ~P  0 

Given  some  current  state,  die  next  state  of  the  Ringbus  depends  on  die  present  state,  the 
decisions  made  in  the  present  state,  and  the  new  requests  dial  arrive  in  the  present  round.  Hie 
states  of  the  Ringbus  thus  comprise  a  discrete  lime  Markov  chain.  The  state  transition  probabili¬ 
ties  depend  on  the  state  and  the  decision  made  in  dial  state.  Note  that  going  from  die  present 
slate  to  the  next  suite  has  two  parts  -  a  deterministic  part  and  a  random  part.  The  deterministic 
part  is  determined  by  the  decision  in  the  present  state:  any  requests  ungranted  in  the  present  slate 
or  corresponding  to  grants  still  in  progress  in  die  present  state  must  appear  in  the  next  state.  The 
random  part  is  determined  by  the  new  requests  which  arrive  to  replace  the  grants  which  ter¬ 
minated  in  the  present  state. 

For  convenience,  we  number  the  states  with  consecutive  integers  suiting  from  1  and  we 
number  the  possible  decisions  in  each  state  with  consecutive  integers  starting  from  1.  We  denote 
the  one  step  probability  from  suite  i  to  suite  j  by  pfj  where  i!  indicates  the  decision  made  in  suite 
(.  Denote  the  decision  made  in  suite  /'  by  </(/)  and  let  I>  —  [r/(  1  ).</ (2).r/ (3),  •  ■  •  ].  We  call  the  deci¬ 
sion  vector  I)  a  policy:  it  specifics  the  decision  made  in  each  suite,  and  thus  completely  specifics 
the  operation  of  die  arbiter.  We  consider  only  suitionary  policies,  i.c.  policies  which  arc  indepen¬ 
dent  of  time.  In  addition,  we  consider  only  policies  in  which  there  is  at  least  one  new  grant  or 
grant  in  progress  in  each  suite  except  for  the  suite  with  rt-  0  for  all  i.  (A  new  grant  is  a  grant 
which  has  a  duration  of  zero  so  far:  it  has  been  granted  for  the  first  time  in  that  suite.  A  grant  in 
progress  is  a  grant  which  has  a  duration  so  far  of  one  or  more  rounds:  it  has  been  granted  for  the 
first  time  in  some  previous  suite.)  We  assume  that  p,  is  nonzero  for  all 
/- -(.Y/2- 1),  •  •  •  -  1,0.1,  •  •  •  ,.V/ 2.  The  above  restriction  on  admissible  policies  and  diis 
assumption  of  nonzero  probabilities  ensures  the  following: 

1)  All  states  in  the  Markov  chain  communicate  -  i.e.  the  n  step  transition  probability  from  state  / 
to  suite  j  is  nonzero  for  all  i  and  j  and  some  «>1.  The  Markov  chain  thus  forms  a  single 
closed  class. 

2)  The  Markov  chain  is  periodic. 


.V.v.v.v.z.  Z  .  v.v 
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llicsc  two  conditions  ensure  that  llie  finite  Markov  chain  has  a  stationary  steady-stale 
(Theorem  2  p.29  of  Klcinrock  |K3|).  Denote  the  steady-stale  probability  of  being  in  state  /  under 
policy  I)  by  The  are  given  by 

and  2*,"- 1  (3.1) 

j  i 

We  call  the  number  of  new  grants  in  state  i  under  decision  </(/),  the  reward,  which  wc 
denote  by  'Die  average  number  of  new  grants  per  round  under  policy  I)  is 

(3.2) 

i 

where  I)-  [</(  1  ),</(2).  •  •  •  |.  The  average  number  of  new  grants  per  round  is  the  throughput  of  the 
Ringbus.  Our  objective  is  to  find  the  maximum  throughput,  g opl ,  and  the  corresponding  policy  I). 
subject  to  given  constraints  on  the  decisions  and  for  given  probabilities. 

'ITic  constraints  on  die  decisions  fall  into  diree  classes  which  wc  term  logical,  topological,  and 
dieoretical.  The  logical  constraints,  which  wc  discussed  at  die  beginning  of  section  3.2,  impose  cer¬ 
tain  basic  conditions  on  die  Ringbus  segments  independent  of  die  arbiter  algorithm  and  Ringbus 
design.  The  topological  constraints  impose  the  mapping  from  a  request  to  the  segments  required 
for  that  request.  Different  Ringbus  topologies,  and  in  particular  different  access  paths,  can  be 
expressed  in  terms  of  different  request  ;o  segment  mappings.  Ihc  Asymmetric  Ringbus  and  die 
Symmetric  Ringbus  differ  only  in  dicir  mapping  of  counterclockwise  requests  to  segments:  the 
Asymmetric  Ringbus  requires  two  more  segments  than  the  Symmetric  Ringbus.  The  dieoretical 
constraints  ensure  smooth  application  of  the  Markovian  decision  formulation,  flic  limitation  to 
stationary  policies  is  of  no  concern  since  any  real  arbiter  implementation  would  likely  operate 
independent  of  time  anyway.  I.ikcwise.  the  limitation  to  policies  with  at  least  one  grant  in  every 
suite  (except  for  the  state  with  r,  0  for  all  /)  is  of  no  concern  since  any  optimal  arbiter  would 
obviously  have  at  least  one  grant  per  round  wherever  possible.  Without  this  limitation,  the  Marko¬ 
vian  decision  problem  might  have  multiple  chains  and  transient  states,  which  complicate  die 
analysis. 

ITic  optimal  throughput  and  corresponding  policy  of  the  Markovian  decision  model  of  the 
Ringbus  can  be  solved  using  Howard's  policy- iteration  method  [114).  We  develop  some  prelim¬ 
inary  results  following  Howard  [114],  for  future  use  and  then  wc  present  Howard's  algorithm. 

Suppose  wc  ran  our  Markov  chain  model  of  die  Ringbus  with  rewards  for  n  rounds  under 
some  policy  I).  Let  l/,l)(/f)  denote  the  total  expected  reward  (i.e.  tol.il  number  of  new  grants) 
accumulated  over  the  //  rounds  that  wc  start  in  state  /.  (  /'(n)  obeys  the  recurrence  relation: 

K%I)  V"*'  -1).  id  .  «>l  (3.3) 

j 
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where  /  is  the  set  of  all  suites.  Howard  has  shown  that  J'/Vm)  has  the  asymptotic  form 

as  n-*co  (3.4) 

v,**  represents  the  value  of  shirting  in  state  i:  v^-v}*,  i*j,  is  the  difference  in  die  long  run 
expected  reward  due  to  sUirting  in  state  /  rather  titan  state  j.  Substituting  equation  3.4  into  equa¬ 
tion  3.3,  we  obtain 

«l,+  *i °~qfu)+  'Zpf/S*'  (3.5) 

j 

If  there  arc  N  states,  equation  3.5  represents  N  simultaneous  equations  in  N  +  1  unknowns. 
We  rectify  this  situation  by  subtracting  vf*  from  both  sides  of  equation  3.5  and  regarding  g()  and 
the  v/»  -  v  I*  as  the  unknowns: 

i-,,/(v,l‘-v,l))- qfW  +  2^(0(  -  v  p  ).  (3.6) 

j 

We  call  these  die  relative  values.  We  can  solve  equation  3.6  for  anJ  the  relative 

values.  Note  that  we  now  have  an  equivalent  form  for  gn: 

(3.7) 

j 

Howard’s  policy  iteration  algorithm  is  the  following: 

1)  Start  with  some  policy  I). 

2)  Value  Determination:  Use  the  pf^  and  qf for  a  given  policy  I)  to  solve 

g°H  v,°  -  V 1D )  =  qf(i)  +  24(0(  vyM  -  »  P )  (3.8) 

I 

for  gn  and  the  relative  values  v/*  -  v}*. 

3)  Policy  Improvement:  For  each  suite  /,  use  die  relative  values  v,**  —  v **  from  the  previous  pol¬ 
icy  and  determine  die  value  or  values  of  k  which  satisfy: 

>nax(  ^5>^D-vP»  (3.9) 

k  j 

If  a  unique  value  of  k  satisfies  equation  3.9  then  set  d'(i)~k.  If  two  or  more  values  of  k 
satisfy  equation  3.9  then  cidicr  one  such  value  of  k  is  </(/)  or  no  such  value  of  k  is  d(i).  In 
die  former  case,  set  J  (i)  ~d(i)  and  in  the  latter  ease  set  </*(/)  equal  to  an  arbitrarily  chosen 
value  of  k  satisfying  equation  3.9.  The  new  decision  in  state  i  is  d’(i). 

4)  If  policy  I)  is  the  same  as  policy  I)  (i.e.  if  d  (/)  <7(/)  for  all  /).  then  slop:  1)  is  die  optimal 
policy  and  g  is  the  optimal  average  reward  per  round.  If  policy  D  is  not  the  same  as  policy 
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I),  then  set  I)  - 1)*  and  go  to  2. 

With  precise  arithmetic,  g1*  increases  monotonically  on  each  iteration  and  Howard's  algo 
ritlmi  tenninates  in  a  finite  number  of  iterations  [1 14].  However,  truncation  errors  can  cause  indc 
finite  cycling  of  the  algorithm  in  a  machine  implementation. 
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.1.1  Opllni.il  Arbiter  for  Four  Slices  and  Grant  Duration  of  One  Round 

In  til's  section  we  investigate  the  optimal  arbiter  for  the  Syminetiical  Ringlnts  with  four  slices 
and  a  grant  duration  of  one  round.  In  this  case  the  suite  description  is 

(ri.r2.rys  4) 

where  r,  -  - 1,  0.  1,  or  2;  /  - 1,  2.  3.  4.  We  assume  that  the  request  probabilities  are  symmetrical 
with  respect  to  die  direction  around  the  Ringbus,  i.e.  p\=p-\.  There  are  256  states  in  this  suite 
description.  However,  this  number  can  be  reduced  by  hiking  advantage  of  die  abundant  sym¬ 
metry  present.  There  are  two  types  of  symmetry  present,  which  we  term  rotational  and  flip.  These 
symmetry  types  arc  most  conveniently  viewed  geometrically.  Imagine  the  Ringbus  represented  by 
four  nodes  (each  representing  a  slice)  connected  by  arcs  (each  representing  a  bus  segment)  to  form 
a  planar  diamond  shape  which  has  three  axes  of  symmetry:  one  perpendicular  to  the  plane  and 
two  in  the  plane  of  the  diamond.  Rotational  symmetry  refers  to  die  symmetry  about  the  axis  per¬ 
pendicular  to  the  plane  of  the  Ringbus.  1-lip  symmetry  refers  to  the  symmetry  about  one  of  the 
axes  in  die  plane  of  die  Ringbus.  Because  of  die  rotational  symmetry  it  docs  not  matter  which  axis 
in  die  plane  is  chosen  for  the  flip  symmetry  axis.  An  example  of  each  symmetry  type  is  illustrated 
in  Figure  3.4. 


(;i)  Rotational  symmetry 


(b)  Hip  symmetry 


Figure  3.4:  Rotational  and  flip  symmetry 

Since  the  request  probabilities  are  identical  for  each  slice  and  symmetrical  with  respect  to  the 
direction  around  the  Ringbus,  by  employing  both  rotational  and  flip  symmetry  all  eight  states 
(±1. 0.0.0).  (0, ±1,0.0),  (00.±!,0),  (0.0,0.±!)  can  be  .ecu  to  be  equivalent  to  (  1 .0.0.0)  Thus  we 

can  replace  these  eight  states  by  a  single  equivalent  state  (  1. 0.0.0).  IU  extracting  all  available 

symmetry,  we  eventually  end  up  with  .1  total  of  43  states,  these  states  ,ne  lifted  in  I  able  3.1  along 
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with  the  number  of  original  suites  which  reduced  to  each  equivalent  suite. 


Slate  Number 

l  x|iii\alciil  Stale 

Reduction  1 -'actor 

1 

0  0  0  0 

1 

2 

-10  0  0 

8 

J 

2  0  0  0 

4 

4 

-1-1  00 

8 

5 

-1  0-1  0 

4 

ft 

-10  10 

4 

7 

-10  2  0 

8 

8 

-1  10  0 

4 

9 

-1  2  0  0 

8 

10 

1-10  0 

4 

11 

12  0  0 

8 

12 

2  2  0  0 

4 

13 

2  0  2  0 

2 

14 

-1  -1  -1  0 

8 

15 

-1  -1  I  0 

8 

Ift 

-l  -l  2  0 

8 

17 

-1  1-10 

8 

18 

-12-10 

8 

19 

-112  0 

8 

20 

-12  10 

4 

21 

-12  2  0 

8 

22 

l  -l  -l  0 

8 

23 

112  0 

8 

24 

12-10 

4 

25 

1-12  0 

8 

26 

12  2  0 

8 

27 

2-120 

8 

28 

2  2  2  0 

4 

29 

-1  -1  -1  -1 

2 

30 

-1  -1  -1  1 

8 

31 

-1  -1  -1  2 

8 

32 

-1  -1  l  T 

4 

33 

-1-1  12 

8 

34 

-1-12  1 

8 

35 

-1-12  2 

8 

36 

-1  l-l  1 

2 

37 

-11-12 

8 

38 

-112  2 

4 

39 

-1  2-1  2 

4 

40 

-12  12 

4 

41 

-12  2  1 

4 

42 

-12  2  2 

8 

43 

2  2  2  2 

1 

Table  3.1:  Suites  After  Symmetry  Kxtraction 


'Hie  optimal  arbiter  problem  can  be  expressed  as  a  Markovian  decision  problem  based  on 
these  43  stales.  We  number  the  suites  as  indicated  in  Table  3.1  and  solve  this  problem  using 
Howard's  algorithm  |!I4|.  figure  3.5  shows  tlie  gain  (i.e.  the  mean  number  of  grants  per  round) 
for  various  values  of  p |  and  pi  (/>o~  l  2/>  j  ~p 2). 
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Figure  3.5  Optimum  average  number  of  grants  per  round  for 
Symmetric  Rmgbus  with  four  slices  and  one  round  grant  duration 


*  I 


tJ 


150 


King  Inis  Model 


Regardless  of  the  probabilities  /•  (  and  />_•>,  the  optimum  decision  rule  in  all  stales  consists  of  the 
following  iwo  steps: 

I.  Consider  only  the  lequcsl  subsets  lor  each  state  that  have  the  greatest  number  of  requests. 
This  amounts  to  maximizing  the  immediate  reward  in  each  suite. 

Decide  which  of  the  request  subsets  with  maximum  immediate  reward  to  grant,  ('litis  is 
trivial  if  there  is  only  one  such  subset.) 

For  all  suites  except  20.  34.  38,  and  40.  and  regardless  of  the  probabilities  p\  and  pi,  the 
request  subset  chosen  in  step  2  of  the  decision  rule  is  the  one  that  has  the  most  requests  of  the 
longest  length  -  i.c.  of  length  2  (where  we  define  length  to  be  the  number  of  segments  required). 

Foi  stales  20,  34.  38.  and  40.  the  request  subset  chosen  in  step  2  of  the  decision  rule  depends 
on  the  probabilities  />  /  and  /n.  Slates  20,  34,  and  40  each  have  two  request  subsets  with  maximum 
immediate  rcwaid  as  shown  in  Die  diagrams  in  Figure  3.6. 


X  * 


Stale  20 


Maximum  reward  rc<|ucsl  subsets 
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IS1 


Figure  3.6:  Some  possible  decisions  in  suites  20.  34,  ;ind  40 

Two  sets  arc  associated  with  each  possible  decision  in  a  state:  a  grant  set  and  a  leftover  set. 
For  a  particular  suite  and  a  particular  decision,  the  grant  set  consists  of  all  the  requests  that  arc 
granted  and  null  request  for  each  of  the  ungranted  requests.  The  leftover  set  consists  of  all  the 
requests  not  granted  and  null  requests  for  each  of  the  granted  requests.  For  example,  if  request 
subset  (a)  is  granted  in  state  20  (see  l-iguro  3.6)  then  the  grant  set  is  (0,0, 1.0)  and  the  leftover  set  is 
(0,2,0.  1):  if  request  subset  (b)  is  granted,  the  grant  set  is  (0.2.0.0)  and  the  leftover  set  is 
(0.0,1.  I).  We  can  write  R  (ij-f-l.j  where  R ,  f»rf,  and  /.,/  denote  the  request,  grant,  and  left¬ 
over  sets  respectively,  -i  denotes  element-wise  addition,  and  lli?  subscript  <1  indicates  that  this 
decomposition  of  R  depends  on  the  decision. 

The  leftover  sets  associated  with  request  subsets  (a)  and  (b)  in  Figure  3.6  arc  the  same  for 
each  of  the  states  20,  34,  and  40  (using  rotational  symmetry  for  state  34).  Thus  the  decisions  in 
these  three  stales  amount  to  the  same  decision:  should  the  leftover  set  be  (a)  or  (b)?  (See  Figure 
3.7.) 


/* 


I  eftover  set  from 
request  subset  (a) 


I  e  hover  set  from 
request  subset  (b) 


Figure  3.7:  Leftover  sets  associated  with  request  subsets 
(  a)  and  (li)  for  each  of  the  three  stales  in  Figure  3.6 
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Of  course  die  decisions  in  many  other  groups  of  suites  other  than  20,  34,  and  40  are  related 
through  their  leftover  suites.  The  Markovian  decision  problem  can  in  fact  oc  formulated  in  terms 
of  leftover  sets  rather  request  sets.  Assuming  that  at  least  one  request  is  granted  in  every  request 
set,  the  number  of  stales  required  can  be  reduced  hy  this  alternate  formulation.  However,  the 
transition  probabilities  arc  more  difficult  to  determine  and  die  problem  structure  is  less  intuitive  in 
this  alternate  formulation. 

Suite  38  also  has  two  request  subsets  with  maximum  immediate  reward.  Ilicsc  two  request 
subsets  and  their  associated  leftover  sets  arc  shown  in  Figure  3.8. 


Figure  3.8:  Some  possible  decisions  in  suite  38 

Notice  die  subtle  difference  between  leftover  suites  (a)  and  (b)  in  Figure  3.8. 

The  regions  over  which  request  subsets  (a)  and  (b)  of  Figures  3.6  and  3.8  comprise  optimal 
decisions  arc  shown  in  Figure  3.9. 


Figure  3.9:  Optimum  decision  regions 
for  states  20,  34,  38,  and  40 


/  / 


States  20,  34,  and  40: 


State  38: 


Request  subset  (a) 


Request  subset  (b) 


Request  subset  (a) 


Request  subset  (b)  otherwise 


/  / 

/  '  ,  / 


/  , 

/  / 
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To  the  l  ight  of  the  line  delineating  request  subset  (a)  and  (b)  for  stales  20,  34,  and  40,  step  2 
of  the  decision  rule  is  the  same  as  that  mentioned  earlier  for  all  the  other  states:  grant  die  request 
subset  that  has  the  most  requests  of  length  2.  In  other  words,  leftover  set  (b)  (of  figure  3.7)  is  a 
better  choice  than  leftover  set  (a)  for  p\  and  pi  to  die  right  of  the  line  in  figure  3.9. 


We  now  investigate  the  regions  over  which  request  subsets  (a)  and  (b)  for  suites  20,  34,  and 
40  arc  optimal  (assuming  optimal  decisions  in  all  other  suites).  Of  course  die  exact  regions  over 
which  each  of  diese  request  subsets  is  optimal  can  be  computed  by  applying  Howard's  pointy  itera¬ 
tion  algorithm.  However,  the  policy  iteration  yields  the  optimal  decision  for  only  a  single  point 
and  thus  the  extent  of  the  regions  must  be  determined  by  the  behaviour  at  many  sample  points. 
This  is.  in  fact,  the  manner  in  which  the  regions  shown  in  figure  3.9  were  csUiblished.  An  analyti¬ 
cal  form  for  the  boundaries  of  the  regions  would  be  much  more  useful,  but  such  a  form  seems 
intractable.  Instead,  we  consider  an  approximation. 

The  basic  idea  is  to  approximate  the  relative  value  (i.c.  r/*  vf’)  of  a  state  /  by  die  immedi¬ 
ate  reward,  q?^'\  in  that  state,  first  we  number  the  states  as  listed  in  Table  3.1.  Since  there  arc 
no  genuine  requests  in  suite  1,  the  only  possible  decision  is  to  grant  all  the  null  requests.  The 
immediate  reward,  q  |,  is  dius  zero.  The  transition  probability.  />,y,  is  simply  the  probability  of  the 
requests  arming  that  constitute  stale  j.  for  example,  if  the  transition  probability  from  state  1  to 
symmetry  suite  19  (  1, 1,2.0)  is  p\\q  ~  ip (2pj.  (There  arc  8  ways  to  go  from  stale  l  to  the  sym¬ 

metry  suite  (-  1, 1,2,0)  -  this  is  the  reduction  number  listed  in  Table  3.1). 

Kqualion  3.7  thus  reduces  to 

t").  aw 

j 

(We  drop  the  superscript  </( I)  on  p\j  since  dicrc  is  only  one  possible  decision  in  suite  l.)  Substi¬ 
tuting  equation  3.10  into  equation  3.6  yields: 


(.1.11) 


Now  consider  \'j\n)  and  the  recurrence  relation  expressed  by  equation  3.3.  I  cl  1,(0)  0  for  all 
i.  Hie  difference  f/fyj)  V |*(n )  is 

f/V)- f, '’<«)-  =<,*'>/  ••• '  'ZttfW'm 

j  k  I  j 

-  nh)4uWkik)<  •  •  • 

j  * 


i 

'  22- ••  I.U’l'r  Pi  j)ni{,)  ■  ■  -  pZ{r)rfu) 

I  *  ) 


(3.12) 
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qf(i)  I  2 

m  0  j  k 


1. 


where  <py*(/«),)  is  die  m  slop  transition  probability  from  state  j  to  slate  k :  <P/*(0)-  |q 

(p)l(m  > 1)-  2/>j/°V«<"0.  As  «-»oo,  |,/i>(«)  -  K|l>(«)-»v/>-  v **  by  equation  3.4  and  Unis 
/ 


v/’-vr-^'V  2  2(/»f  >-P.i)2TJ»(«)V(*> 

m  -0  j  k 


(3.13) 


We  now  have  two  alternate  formulations  for  the  relative  values  v/)-V|l):  equation  (3.11) 
and  equation  (3.13).  liquation  3.11  provides  a  way  to  calculate  the  relative  values  and  equation 
3.13  allows  an  interpretation  of  the  relative  values.  We  sec  from  equation  3.13  that  r/ ’  i>il)  is  the 
infinite  sum  of  probabilistically  weighted  rewards.  Rewritten  as 

V"  0,  2  SsS/’wo-iV'11.  0.14) 

m  0  j  k  in  0  j  k 

our  earlier  interpretation  of  v,11  -»*f*  as  die  difference  in  die  average  total  reward  starting  in  suite 
i  relative  to  stating  in  state  1  is  obvious,  (l  iquation  3.14  can  be  generalized  for  v,11  --  v }*.) 

1-quation  3.13  suggests  that  v,n  -vf*  can  be  computed  to  arbitrary  accuracy  simply  by  sum¬ 
ming  enough  of  the  terms  on  the  right  hand  side.  One  way  to  approximate  v/l>  v}\  which  we 
now  pursue,  is  by  die  first  term  of  its  infinite  series  expansion,  i.c.  v,1’  '  i>  a<l?(l)-  I  bis  approxi¬ 

mation  has  the  merit  of  avoiding  any  computation  with  die  transition  probabilities.  Of  course 
some  accuracy  is  lost  in  diis  simple  approximation.  However  this  merit  is  very  important  when  the 
number  of  suites  is  so  large  that  it  is  a  great  deal  of  work  to  compute  all  the  transition  probabili¬ 
ties.  (Such  is  die  ease  for  six  and  eight  slices  as  discussed  in  die  sequel.) 

In  some  eases  the  approximation  v,l)  v '*  ~ is  exact.  Consider  those  states  i  in  which 
all  the  requests  can  be  granted  simultaneously  without  conflict.  We  call  the  request  sets  of  such 
states  immediately  grantablc  and  we  denote  the  set  of  such  states  by  Id.  If  the  decision.  </(/),  in 
some  state  /€/<#  is  such  dial  all  the  requests  arc  granted,  then  the  leftover  set  for  state  i  is  the 
same  as  die  leftover  set  for  state  1.  Now  if  two  stales  k  and  /  have  die  same  leftover  set.  then 
l>kjk)  -  pf/1*  for  all  j  since  the  next  state  is  entirely  determined  by  the  leftover  set  and  the  proba¬ 
bility  distribution  of  new  request  arrivals  which  is  the  same  for  both  states.  Thus  if  </(/)  is  such 
that  all  requests  arc  granted,  then  p?/'*  I>\j  for  all  j.  liquation  3.11  then  implies  that 
v,"  -v|*  -qf(i  K 

This  previous  result  can  be  generalized.  Consider  any  two  states  i  and  /  with  decisions  </(/) 
and  (/(;)  such  that  both  stales  have  the  same  leftover  set.  Then  for  all  k  and 
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v,"  vP  qf qpi\  This  result  follows  from  the  obvious  generalization  of  equation  3.11  to 

k 

Hie  determination  of  all  die  relative  values  v,n  vf*.  and  hence  solving  for  g1*.  thus  amounts  to 
determining  the  difference  in  relative  values  of  states  witn  different  leftover  sets.  This  is  consistent 
with  our  earlier  observation  dial  die  Markovian  decision  problem  can  be  expressed  in  tenns  of 
leftover  sets  radicr  than  request  sets. 

Since  die  relative  value  in  state  /,  v,11  -  vf*.  represents  die  difference  in  the  average  total 
reward  starting  in  state  /  relative  to  that  starting  in  state  1,  (which  has  only  null  requests),  it  seems 
intuitive  that  v,1’  -  vj*  should  never  exceed  die  number  of  genuine,  i.e.  non -null,  requests  in  that 
state  which  we  denote  by  n,.  We  found  that  indeed  c/1  v  *  *  <  //,  for  all  stales  /  for  every  ease  we 
investigated  for  four  (and  six)  slices.  We  were  unable  to  establish  if  this  inequality  is  true  in  gen- 


Wc  now  return  to  our  approximation  v/)-  v  **  and  the  determination  of  an  approxi¬ 

mate  analytical  expression  for  the  regions  corresponding  to  request  subsets  (a)  and  (b)  in  states  20. 
34,  and  40  in  die  four  slice,  single  round  grant  duration  Symmetric  Ringbus.  Request  subsets  (a) 
and  (b)  each  grant  die  maximum  number  of  requests  possible  in  each  of  the  states  20.  34,  and  40. 
l  ims  the  choice  of  request  subset  (a)  or  (b)  in  these  three  states  docs  not  depend  on  the  immedi¬ 
ate  reward:  it  depends  only  oil  the  leftover  sets.  l;or  a  given  policy  I).  request  subset  (a)  results  in 
an  improvement  in  die  throughput  if 


and  request  subset  (b)  results  in  an  improvement  if 


where  i  -  20.  34,  or  40  and  we  have  cancelled  the  immediate  rewards  from  both  sides  of  the  ine¬ 


qualities.  Approximating  v}*  -  v  f*  by  qf^K  we  have: 

j 

If  A>0  dicn  request  subset  (a)  is  best  and  if  A<0  dicn  request  subset  (b)  is  best.  Since  we 
already  know  dial  the  optimal  policy  consists  of  granting  the  maximum  number  of  requests  in 
each  state,  <//(/)  is  equal  to  the  maximum  number  of  simultaneously  grantablc  requests  in  state  j. 
flic  leftover  sets  from  request  subsets  (a)  and  (b)  arc  shown  below. 


■  ’Vi  v*  «**'  «  *  V**  -  \ 
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I  efiover  set  from 
request  subset  (a) 


I  efiover  set  from 
request  subset  (b) 


Figure  3.10:  Leftover  sets  from  request  subsets  (a)  and  (b)  in  states  20,  34.  and  40 


Table  3.2  lists  die  possible  next  states  (without  symmetry  removed),  the  immediate  reward  in  each 
state,  and  die  transition  probability. 


1  .efiover  Set  (a) 

l  eftover  Set  (b) 

Next  Suite 

Immediate  Reward 

Transition  Probability 

Next  State 

Immediate  Reward 

-1.  2,- 1,-1 

3 

P\ 

3 

-1,  2.-1.  0 

2 

PoPl 

-l.-l,  1,0 

2 

-1.  2.-1,  1 

2 

„2 

Pi 

-1.-1.  1. 1 

2 

-1.  2.-1.  2 

2 

PlP2 

-1,-1,  1.  2 

3 

-1.2.  0,-1 

2 

POPl 

-l.  0.  1,-1 

2 

-1.  2,  0,0 

I 

pd 

-1.  0.  1,  0 

l 

-1.  2,  0.  1 

2 

P0PI 

-1,  0.  1,  1 

2 

-1.  2.  0,2 

2 

POP2 

-1.0.  1,  2 

2 

-1.  2,  l.-l 

2 

PI2 

-1.  1,  l.-l 

2 

-1,  2.  1,0 

1 

POPl 

-1.  1.  1.  0 

2 

-1.  2,  1.  1 

2 

Pi2 

-L  1.  I.  1 

3 

-1.  2.  1.  2 

P  IP 2 

-1,  1.  1,  2 

3 

-1,  2,  2,-1 

3 

P  IP  2 

-1.  2.  1,-1 

2 

-1.  2,  2.0 

2 

POP  2 

o 

rs» 

• 

L 

1 

-1.  2.  2.  1 

2 

P  IP  2 

2 

-1,  2,  2.  2 

2 

Pi 

-1.  2,  1.  2 

2 

Table  3.2:  Rewards  and  Transition  Probabilities  for  Decisions  (a)  and  (b) 
in  States  20,  34,  and  40 


After  some  algebra  we  obtain 
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A  PoP2-P(\P\-PiP2~P\ 

which  c;in  be  further  simplified  to 

A-W  P\-1P\P2PP2~P2- 

The  boundary  between  the  regions  for  request  subsets  (a)  and  (b)  is  given  approximately  by  A-0. 
This  approximate  boundary  is  surprisingly  close  to  the  exact  boundary  between  the  regions  as 
shown  in  Figure  3.11. 

We  arc  not  so  fortunate  with  die  boundary  between  the  regions  in  which  request  subsets  (a) 
and  (b)  in  state  38  (see  Figure  3.8)  arc  optimal  respectively.  An  analysis  similar  to  that  just  com¬ 
pleted  for  states  20.  34.  and  40  and  again  with  v,n  -  v  '*  approximated  by  qfU)  for  all  /  yields 
A  P\(P\  P?)-  Thus  lire  boundary  between  the  regions  for  request  subsets  (a)  and  (b)  in  slate  38 
is  approximated  by  p\~  P2-  Ibis  approximate  boundary  and  the  exact  boundary  are  shown  in 
Figure  3.12.  The  large  discrepancy  in  these  boundaries  indicates  that  v/*  i*  *  *  is  not  a  very 
good  approximation  in  this  case,  litis  is  to  be  expected  since  the  dilTcrcnce  between  leftover  sets 
(a)  and  (b)  is  very  subtle  (see  Figure  3.8).  We  expect  the  average  reward  per  round  to  he  almost 
the  same  for  request  subsets  (a)  and  (b)  over  much  of  the  p\  p  2  probability  space.  Of  course, 
greater  accuracy  in  estimating  the  boundary  can  be  achieved  by  using  more  terms  of  equation  3.13 
in  the  approximations  of  v/*  -v'). 
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We  investigated  the  approximation  v/’  r*’  ~qf*‘*  in  two  instances.  In  the  first  instance  we 
approximated  the  test  quantity  (equation  3.9)  in  step  3  of  Howard's  policy  iteration  algorithm  by 

max(  qh  l>l!U)«h2rlj(S™)  0.15) 

k  j*i 

where  q™3'  is  the  maximum  number  of  grants  possible  in  state  j.  We  found  that  die  decision  k 

yielded  by  this  approximate  test  quantity  reliably  predicts  the  optimum  decision  in  state  i  in  most 

cases.  (  The  main  exception  was  in  state  38.)  In  the  second  instance  we  approximated  gopt  by 

g ‘ w  -  ’J'ji  i  ,q,ma\  This  approximation  corresponds  to  granting  die  maximum  number  of  icquests 
j 

in  every  suite  and  ignoring  die  leftover  requests.  The  comparison  of  the  calculated  values  of  gnpl 
and  g,xl  shown  in  Figure  3.13  for  various  probabilities  reveals  dial  gtsl  is  a  good  approximation  to 
g0,,t .  In  every  case  investigated  we  found  0<g°rl  -  g1''1  <0.22.  Figure  3.13  also  shows  die 
optimum  average  number  of  grants  per  round  for  a  crossbar  interconnection  of  four  slices.  I  bis 
crossbar  interconnection  is  similar  to  the  Ringbus  interconnection  except  for  fewer  constraints  on 
which  request  subsets  may  he  granted.  In  tact,  the  only  constraints  on  the  request  subsets  arc  des¬ 
tination  constraints:  no  two  requests  that  have  the  same  destination  can  be  granted  simultaneously. 
Since  a  crossbar  has  fewer  constraints  than  a  Ringbus.  its  performance  will  always  be  superior  to 
that  of  a  Ringbus  (provided  everything  except  for  the  interconnections  is  the  same). 


\2.i9 

>2,07 


a 


Figure  3.13:  Average  number  of  grants  per  round  with 
four  slices  and  one  round  grant  duration 
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.3.4  Optimal  Arhilcr  for  I  our  Slim  and  (.rant  Duration  hr  cater  than  One  Round 

In  this  set  Iron  we  investigate  the  ojiliin.il  .trbilei  lor  the  Syinmclnc.il  Kmpbus  with  lorn  slices 
and  deterministic  grant  durations  of  2.  V  and  4  rounds  and  geometrically  diMiilmlcd  ui.ml  dur.i 
lions,  lire  baste.  stile  dcseripiion  » 

( r  |  .</ 1  j  y,  i  \.tl \.r  4  J 4) 

.is  desuibed  in  section  .3.2  .md  we  .issnme  symmetric  Rmgbtis  probabilities  ie  /■  p  \s  in 
the  previous  section  wc  apply  rotational  and  (lip  symmeliy  (■>  sigud'kamlc  induce  ihe  unnilx'i  '>1 
slates  reejmred  In  l.ict  we  start  with  the  same  41  states  as  n>  ihe  previous  section  ml  nld  (hi  / 


to  these  stales  to  obtain  .1  complete  state  description  llowevci  mu  cmnplii  iium  ,my'  1  iccjoest 

may  be  granted  ill  eilhc'l  ot  two  wavs  mi  the  Kmghns  m  the  slnulesi  dn.i  1 .  tm  .in  sis  .it 

length  I  and  the  clockwise  dilation  tm  leijnests  ol  leiietli  '  (we  ■  .!!  ; 1 1 1  s  tin.  pinn  r  .  . t ■  1  omti  ■>< 
the  longest  direction  lot  lc'c|liesls  ol  length  I  and  the  •  omm  n  *oc  k  w  iv  >|m.  n.  u  '  cj  1  .1  1 

length  2  (wc  call  this  the  *'ioiu!.ny  direction)  I  Ol  1  iM  ml  in  pi. .fit  r  ;  ^  'l  !.  ,1.1. 

description  must  include  the  dnection  in  v»  1 1  *c  h  ih«  icc|m  si  has  (>,.1.  ..'  I  ..  ..  •  1.  .  nt* 

allocated  to  that  grant  are  known  (Ain  w..c  to  1111  iu.k  'hi  "lmm.ili.at  '■  ..  n.i  ,  .  a,  ,| 

with  e.ieh  f'.r  ii'l  in  firm  unite  itmt>  it  .!u<  lion  on  du  I'  inuln.s 

A  .note  c  llii  lent  1. ic  ll'ocj  (1 ,  me  |i-.t-  iln  ,)ii<  ■  1  ion  . a  .  .  mi  >  In  m  .  mton 


i'  based  on  the  tollow mj!  iwo  o)>s*i'  itnms 

1  ll  .1  iccpicst  ol  Icin’  h  I  is  granted  rt<  >ln  loinn-.i  ■■•  >  ( •  .  1  ...  •  <  ■>. 

■•the  1  1  c  c  |  lie  st  call  he  gl.il'licl  III  (it.  I*  aif..  .1  -a. .  a.  .  1  ’!.r 

granted 

2  ll  .1  lecj  test  -  .1  length  ’  is  giant.  .1  -n  S.  mini.  k  «  ■>  )••  >  •<<  ’*  .  ' 

i  can  h.  ,'ianlcd  n>  'In  long,  si  In  -ma.  an!  i  *  I .  > 

Ic  e.  li x  k  a  isc  dire,  (ion  ai  (»•  c  1  iiiii  i| 
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2.  Determine  )',(ii  /  I)  for  .ill  i  from: 

K,(m  r  1)  **r.n(  //,*  i  )  (3.16) 

A  7 

The  policy  in  round  ii  •  1  is  IK«  *  I)  (</(  1  ),</(2),  ■  •  •  j  where  </(/)  is  equal  to  the  decision  k 

which  maximizes  die  right  hand  side  of  equation  I  16  for  state  /. 

3.  Increment  n.  go  to  step  2. 

I  hc  real  power  of  value  delation  is  due  to  die  so-called  Odoni  bounds  lO-*)  which  give  upper 
and  lower  hounds  on  the  optimal  average  reward  per  round.  These  limits  improve  on  each  itera¬ 
tion  and  eventually  converge  to  the  optimal  average  reward  per  round.  Deline 

A,(  // )  !,(//)  I  ,(n  I).  /(//)  min  5,(//).  and  //{»)  max  S,(n).  Alter  the  ii'1'  iteration  of 

/  i 

step  2  we  have 

/  (//  -  \  )<K°'"  <l'Ut  *  I). 

I  uiihei.ooie.  /  (//)>/  (»i)  and  I  (n  )<l  (m)  lor  #«<«.  Iherefore  after  die  n'h  iteration  of  step 
1  i;"r'  may  be-  estimated  by  g "n‘  ~ g <U I "  '  I)  ^  ^  ^  As  „  -»oo, 

r(/’ )  '  ’  (,'>  *<„)  (,(„)-*>,  and  I X  // )  —  1 

I  01  mu  pm  poses.  '  due  ’leiaiioa  also  has  iiiiplcuicntalion  advantages  over  Howard's  policy 

it.  lain  mi  W 1  Hi  value  uei  .t.mi  we  tied  1  uly  stare  die  !,(//)  lor  all  states  and  die  request  proha* 

k  k 

biliues  i>,  I  Ik  possible  decision.  A  .md  associated  rewards.  </,.  and  transition  probabilities, 

1. an  lie  computed  on  the  My  Will-,  policy  iteration,  it  is  dilVicult  to  solve  for  g  *  .md  the  relative 
■  aloes  without  fust  having  calculated  and  stored  all  the  c/,  and  pn  for  a  partiatl.tr  policy,  which 
tcquues  .1  lot  ol  stoi.iee  it  the  number  of  states  is  large. 

Ol  .must  wiili  value  iiei.m an  neither  the  estimate  of  y/’1"  nor  (lie  estimate  of  the  optimal 
l  ■  .ii.,  .  is  no  ess.uily  cv..ct  llowevei  |g''f(//)  y,  '"\  can  he  as  small  as  desired  simply  by  iterating 
1,  1,1)1.  1  ml  In.'.  11  ll\,  .bile  iciic  e  between  the  upper  and  lower  bounds  on  y"r'  indicates  the 

.1  lllfn.tlMr  Ilf  III  v  ill  I  It.  I  ‘htc  O  IK//  )  and  I . 

I  , a  ill.  laigc  uiai.iKi  ol  stales  that  we  ue  considering,  any  other  known  method  would  also 
,  appnmm.iu  1  .suits  With  How  aids  policy  iteration  we  would  have  to  use  iterative  tech* 
aipu.  in  ,  ila  an  ui  app'oxim.itc  solniien  to  the  laige  set  of  simultaneous  equations  represented  by 

iju  ll  M  HI  l  M 

IV.  I.  i,  ai-  .t.irig  ,Ki  iminisi'.  and  geometi ic ally  distnbuted  giant  distributions,  we  intro- 

1 . 1  .a . 1  .ii,  1  \\ .  "i  a .  u-.e  1  to  denote  (he  .rei.ipc  reward  per  round.  However,  we 

..  )  h  1.  1  ,  ..  .  !  i;,t  . .  is  lit.  nuiiilvt  ol  new  and  continuing  giants  in  that  round 
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(rather  than  just  the  number  of  new  grants  in  that  round). ^  Thus  g  is  better  thought  of  as  the 
average  number  of  grants  in  progress  per  round.  Of  course,  if  all  grants  have  die  same  duration  of 
one  round,  then  g  is  also  the  throughput.  We  define  die  throughput  as  the  average  number  of 
new  grants  per  round  and  denote  it  by  l.  l  or  a  general  grant  duration  distribution  with  mean  </ 
rounds,  the  throughput  and  average  number  of  grants  in  progress  per  round  arc  related  by 
l  #/</.  The  throughput  of  the  Ringhus  is  also  given  by  die  number  of  slices  divided  by  die  mean 
cycle  time  per  slice.  I  bis  yields  the  throughput  balance  equation 

—  - - -/-£/</  (317) 

no 

-  -  -  t  wkh  f  d 

1  pQ 

no 

where  is  the  mean  ptocessing  time  per  slice  (po  is  die  probability  of  a  null  request).  wKn  is 

1  no 

the  mean  waiting  time  per  request,  <1  is  the  mean  grant  duration,  and  .V  is  the  number  of  slices. 

3.4.1  Deterministic  (.rant  Duration  of  2.  3.  and  4  Rounds 

Using  value  iteration  and  Odoni's  bounds,  as  described  earlier,  we  obtained  estimates  of  die 
optimal  average  number  of  grants  in  progress  (trivially  related  to  the  diioughpul)  and  estimates  of 
(he  optimal  policy  for  (lie  .Symmetric  Minibus  with  deterministic  grant  durations  of  ?.  3.  and  4 
rounds.  Figure  3.14  shows  the  optimal  average  number  of  grants  in  progress  for  these  three  eases 
and  for  grant  durations  of  one  round,  as  investigated  earlier,  lor  selected  probabilities.  All  these 
estimates,  except  diose  marked  with  an  asterisk,  .ire  within  ±.005  of  optimal.  I  he  asteiisks  indicate 
estimates  for  which  a  tolerance  of  ±.005  was  not  achieved  after  100  iterations.  I  he  maximum 
error  in  these  estimates,  as  dctciniined  by  the  bounds  /  (100)  and  U0U0).  is  ±.0175. 


t  111  CIO  is  no  Ihcoi*  I'Cil  re.)  son  in  picler  one  of  ihese  riel' ini  inns  til  the  n  *.ird  out  the  othei  Wtih  our  defini 
lion,  I  he  solution  of  the  M.ukovian  deciMo.i  pioblem  velds  ilk  uwu^e  number  ol  j"  ini'*  m  p  |Hi  round 

W  illi  the  other  dehuUon  the  so’uiii  n  vkUS  the  Ihioiifhprt  |  Ik  averse  mimlHr  til  (i.uu-  in  pn*vt»  s-  per 
round  ami  Ilk  thiouj'hpul  are  tmiali)  related  a>  shorn  i  in  tjuMion  l  | 1  |  tu  u  Imvuur  <  impim  I  n.i.-m 
io  prefer  our  dehn.hon  ol  the  repaid  t<u  r  the  olhei  (Ichmiton  wt  found  ilui  ilk  v.ihu  tUMi<«m  uuilnwl  ton 
verged  faster  with  our  definition 
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Figure  3.14:  Average  number  of  grams  in  progress 
per  round  for  deterministic  grant  durations 
of  1.  2,  3,  and  4  rounds  for  5  =4 
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Note  ih.it  the  optimal  average  immher  of  grants  in  p* ogress  (which  we  will  t.ill  simply  g  in 
the  rest  of  section  ,V4)  is  a  strom;  function  of  the  grant  dilution.  </  when  /»o  is  large  -  i.e.  light 
loading.  As  /* o  decreases,  the  giant  duration  lias  less  elTcct  on  g.  In  f.icl.  for  /> o  (I.  g  appears  to 
he  independent  ol  the  grant  dotation  (lot  deterministic  giant  dilutions)  Ihese  ohscrvations  make 
intuitive  sense.  When  />o  is  laigc.  nonmill  requests  .ire  rare  and  occur  with  almost  equal  likelihood 
regardless  of  J.  t  he  difference  is  that  the  grants  List  longer  for  largei  </  and  thus  contribute  more 
to  g.  When  />()  is  small,  the  Kingbus  is  nearly  salinated  with  nonnull  requests  every  round  and 
thus  il  has  little  effect  on  g  Section  V4  2  examines  these  obvrvjlions  with  more  rigour 

because  ol  the  large  number  ol  slates.  especially  for  (/  4,  it  is  impossible  to  discuss  here  in 

detail  the  estimated  optimal  decisions  in  c.m  h  state  lor  ,/  2  V  and  4  Instead,  we  will  just  discuss 

one  mam  trend  obseived  in  the  estimated  opiunal  de»  isions  It  was  exhausting  enough  to  examine 
all  i he  states  Im  v  inous  piohahilifies  to  deleimiin  this  trend 

I  he  mam  trend  is  the  lollowmg  mtereslmg  observation  aimcl  tines  the  estimated  optimal 
decision  m  some  stales  gi  nils  I.  ,,  than  (lie  maximum  lew.ud  in  those  states  I  lie  stale’s  in  which 
1 1 1 is  phenomenon  w.is  obsc’ive.i  till  into  two  mam  c lasses  I)  states  with  a  small  (one  oi  two) 
niiiiihci  ol  noniiuil  icquesis  .ill  onpi  mted  so  l.u  .aid  2)  states  with  a  Luge  number  ol  ic'quests  with 
giants  hi  pioL’o  uni  a  in  ill  a. mil's  i  ol  uivi  mte  d  nonmill  inquests  \n  exainpL  ol  a  stale  in  the 
fust  s  lass  is  the  state 


t  I  ii  0  0  0  0  I)  0  S) 

Mlhinrvh  III.  rstjas  a  m  lie.  o  ■  atnisdi.aelv  gi  u  I  able  I'i  o|Uiu<<uf!  i  .lim  ited  d'  >  ision  some' 
tints  s  is  not  a-  gi  mi  the  is  qm  - 1  N  m  >t!is  t  s,n  Ii  s  x ample  is  the  stale 

t  I  2  0  0  II  It  I)  0  M 

\ .  i 1  n  'In  is  c|im sl  in  ilio  si  a  i  n  ■  m  inn  <1 1  it.  I  v  gi  nl  it'le  i  in  I  ig  mi  lh<  optiumm  sslim  i  >  >  I  ,  Ii  s  i 

i  *  1  ■ '  J  Ml  II  1  mil  is  Ii .  'lain  Ml  it  III  I  o  1 4 1  I  S(  \  III  "I  ill  III  I  stall  m  tin  ssi  .nisi  .  I  ass  is  tils  stats 

I  I  I  i  I  ,  ,  o  SI  •  I  ’  l  ./I 


i’ll  >  '  I.  I  .1  i  a  I  If  ■!'  I  '  ’  I'I  \  I  t  I  H  • .  I  ^  I  1  till  !  I  I  I  a  1 1  a  I  I  I  l  a  It  1 1  •!'  I  1 1  *  is 


■  I .  : .  i "  ii  ■  a  n  i  j"ii  '"'i  '  "  i  1  i  1  '  '  I ...  ■  i  v  j  ■  a  M'i 
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The  less  th  in  the  maximum  reward  tendency  is  most  pronounced  for  heavy  loading  •  i.c.  small  /»o 
-  and  large  J.  I -or  light  loading  •  i.e.  large  /»»  -  the  estimated  optimal  decision  in  every  state  grants 
the  maximum  reward. 

An  explanation  for  the  observed  tendencies  is  that  there  is  a  tradeoff  between  the  im  .ediale 
contribution  to  throughput  obtained  by  granting  a  request  and  the  possible  future  degradation  to 
throughput  caused  by  die  constraints  imposed  on  future  grants  by  granting  a  request,  l  or  .1  grant 
duration  of  </  rounds,  any  request  granted  imposes  constraints  on  which  requests  may  be  granted 
m  the  </  1  rounds  alter  it  is  first  granted.  I  or  light  loading  it  is  unlikely  that  a  nonnull  request 

will  arrive  within  </  rounds  of  granting  .1  request  and  tints  the  future  degradation  caused  by  giant- 
mg  requests  is  negligible.  Ilierelorc  the  tradeoff  is  in  favour  of  granting  requests  immediately  and 
hence  the  tendency  towards  granting  the  maximum  reward  foi  light  loading.  I  or  heavy  loading,  it 
is  very  likely  that  a  nnnmill  request  will  arrive  within  ,1  rounds  of  gi anting  a  request  and  ilnis  the 
Inline  degiadation  caused  by  granting  requests  can  be  very  significant.  I  his  degradation  can  be 
icdiRcd  by  avoiding  Luge  ditl'eiences  in  the  dilution  that  different  giants  have  been  in  progress. 
Stales  m  the  lit st  class  (ol  the  two  mentioned  earltei)  achieve  tins  by  not  granting  any  requests. 

In  heavy  loading  additional  nonnull  reouests  will  likely  amve  very  soon.  *0  il  makes  better 
sense  to  w  ut  until  these  leqiiesls  amve  and  gum  all  the  lequesis  ai  011c.’  rathei  than  giant  the  1111- 
ti  ll  request  and  dela,  iIk  mauling  ol  .mv  subsequent  noanull  reques's  until  llie  grant  ol  the  initial 
lequesl  tcinimate  Si  lies  n.  die  second  1  lass  avoid  ddteiem.es  in  die  doralion  ol  gunts  in  pi  ogress 
In  dela  mg  llie  gi.aiinig  01  new  noanull  '.quests  until  the  giants  in  piogiess  teimniaies  I  be  lesull 
in  Ik  a'  v  loading  is  tli.it  all  j'sii.:  .  in  a  I  a  'einl  to  11  ne  tin  same  di nation  ill  piogiess 

1  Inis  m  lie  .1  v  ,  loadmg  ili  ie  is  a  K  ndeiu  v  towaids  gi.uilmi'  lequesis  at  intervals  ol  ,/  louiuls 
V\  .  1  v  11 1 1  .mil  an  il^oiiilnn  an  inKnal  algniillim  Note  that  an  mleival  algoiillun  compl'lelv 

cl  annates  tin  tlnonghpnt  ilegi  adal  ion  c  .Disc'd  be  the  lonsti  unis  a  giant  lllj'oses  on  tutllie  gi  nils 
in  i  1  u  m  giants  in  an,  in  lei  i  at  do  1101  impose  i  onsi  i  amis  on  tin'  gi  mis  in  sim  ecdmg  intei  v  a  Is 

'  t  '  I  S  li  rmimstn  to  mi  I  liiial  mu  the  t.cmr.il  <  ase 

In  1I11-.  ill"..  Sol  I  ;n  .  n|  i  m  ,,  i  •  '  n'ls  to'  d  lc  M'.ini  ,tn  t',1  nil  >l.n  i'ioi  s  X\  e 

issi no  ..  on  , K  1.  ’11110,111  .  1  i'il  .till  t vp  a  f  •  a  ids 

l  4  2  I  I  n  ncu I  (  liar  u  ii  r  isin  s  nt  ( )|it  mnmi  I  hrnugh|Uit 

(  1  I,  ,  .1  ,  .  |  ■  '  1  1 1 .  1 1  ■  1 1  ■  1 1  a  *n  ,1 '  ■  1 1  ■  ’  ■  *  e  an  in!  '  11 . ill  ; 

,  .  1  11  .  ■  .  1 ■ ,  *  ■  •  "  ■•  .  n  ■  . .  1  ,  |..  mu 


n  1  .1 


170 


Kiiigbus  Model 


diopl  ,  d-iop' 

as  and  ,  .  need  not  he. 

30  Pa)  3d  Par 


l  or  any  particular  policy  (i.c.  set  of  decisions  in  each  state),  all  the  dei  natives  of  /  with 
respect  to  the  probabilities  will  he  continuous.  However,  the  optimum  policy  can  vary  with  U'.e 
probabilities.  Iluis  the  optimum  throughput  over  any  portion  of  the  feasible  probability  region  is, 
in  general,  a  piecewise  combination  of  the  throughput  of  the  optimum  policy  in  each  subregion. 
The  derivatives  of  I1'1'1  with  respect  to  die  probabilities  will  not.  in  general,  be  continuous  at  the 
boundaries  of  the  subregions.  Fortunately,  the  number  of  discontinuities  along  any  ray  is  finite 

a  i°p> 

since  there  is  only  a  finite  number  of  different  policies.  Strictly  speaking.  _  .  is  not  defined 

3(  I  />o) 

at  such  a  discontinuity,  but  it  may  be  defined  to  have  the  value  of  one  of  the  policies  at  die  point 

di°r' 

of  discontinuity.  In  this  sense.  is  stuctly  positive  for  all  />()  along  a  ray  (except  possibly 

3(  I  /»<>) 


at  the  end  points). 

aV'* 

It  is  also  obvious  dial  ,  <0  for  all  /i0  along  a  ray.  except  perhaps  at  discontinuities 

3<  1  /»o)' 

at  the  boundaries  of  the  subregions  corresponding  to  different  policies.  (Recall  that  the  optimum 

a v*" 

policy  can  vary  with  the  probabilities.  Note  also  that  .  is  not  dehned  at  such  discontinui- 

3(1  l>  o> 

ties).  Within  any  particular  stibicgion  along  a  tay  the  rale  of  increase  ot  i''r'  with  I  i.c. 

a  i  <’Pi 

'  decreases  as  1  />n  me  teases  since  thcie  are  fewci  null  r<\iucsts  to  repl.kc  by  nonnull 

3(  I  />(,) 


icijiiesls  (and  thus  increase  lop< )  as  I  ;>o  'iicieas.s  Hence.  ,  <0  within  the  subregion 

3(  I  p  oY 

and  thus  inpl  is  convex  down  in  I  />o  within  each  subregion  along  any  ray  Note  that  it  docs  not 


,  r  aV* 

follow  from  this  dial  i"r  is  convex  down  everywhere  along  a  given  ray.  even  il  ,  is 

3(1  P  o) 

icilc lined  at  points  ol  discontinuity  In  summaiy  i°r'  is  monotonic  in  1  />(l  everywhere  along  a 

lav  and  convex  down  i,  I  pn  wiilun  an.  subregion  along  a  ray 


V4.2.2  Hounds  on  the  Opt iinuin  throughput  with  IKuiiiiiiiislic  (.rant  Durations 

let  ij}  ’ '  and  iy  ' 1  denote  the  optimal  lliiouglipul  and  the  optimal  avctage  mmibei  of 
giants  pei  round  icspcctivelv  toi  giauls  with  a  dcictmmistk  duration  ol  ./>l  round  anil  loi  some 
set  nt  piobabiluies  />,  ps  -  denoted  bv  />*  (S  is  the  mmibei  ot  slues)  Smiilailv  let  ij}  1 

and  iy  !  denote  die  oponial  tbiuiighpul  and  ..pumal  aveiage  numbei  ol  giants  per  inuiiil 
li  s(*Ct  lively  It.)  gi.mls  oilh  a  ilul.ilion  ol  one  lomnl  anil  lol  some  scl  ul  pti'labilllios  pn  p  ] 
/>s  i  denoleil  bv  p  Now  'ui  the  same  si  t  >  >  I  pmhabiiilies  m  each  i.ise  lie  p  p  )  die 
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optimal  throughput  with  </>l  can  he  no  more  than  the  optimal  throughput  with  <1  I.  this  fol¬ 
lows  since  for  a  fixed  set  of  probabilities  the  optimal  throughput  cannot  increase  as  d  increases. 
Thus 


or  g^'<dg}-\ 


for  (/>  I  it  is  possible  to  grant  at  least  the  same  average  number  of  grants  per  round  as  for  die 
same  set  of  probabilities  for  </  I.  The  argument  is  as  follows.  Restrict  the  instants  at  which  all 
new  requests  -  even  null  requests  -  for  d>  1  can  be  granted  to  the  beginning  of  every  d'h  round 
in  synchrony  with  some  clock  of  period  J  rounds.  (Note  that  null  requests  have  a  grant  duration 
of  one  round  and  nonuull  requests  have  a  grant  duration  of  d  rounds.  Restricting  the  granting  of 
null  requests  to  every  </  rounds  synchronous  with  the  clock  of  period  d  artificially  lengthens  the 
grant  duration  of  a  null  request  to  d  rounds.)  At  the  "arbitration  instant"  at  the  beginning  of  each 
successive  interval  of  d  rounds  (synchronous  with  the  clock  of  period  </).  grant  the  request  subset 
corresponding  to  the  optimal  decision  for  that  request  set  with  d  I  and  the  same  set  of  probabili¬ 
ties.  I  he  result  is  an  arbitration  algorithm  for  ,7>1  which  is  exactly  the  same  as  die  optimal  arbi¬ 
tration  algorithm  with  d  1.  die  same  set  of  probabilities,  and  a  arbiter  clock  period  of  </.  I  hat  is. 
by 

1)  restricting  the  instants  at  which  new  requests  -  even  null  requests  -  can  be  granted  to  every 
d  rounds  synchronous  with  some  clock  of  period  d,  and 

2)  using  this  dock  of  period  </'  as  the  arbiter  clock, 
the  arbitration  problem  reduces  to  that  for  d  1.  Thus 


4 


—hp 


or 


V 

d** 


i*r- 


We  call  an  arbiter  algorithm  that  operates  in  ■tccordance  with  point  I  above  an  interval  algo¬ 
rithm.  We  call  the  optimum  algorithm  subject  to  this  icstriction  the  optimum  interval  algorithm.  \s 
just  discussed  above,  the  optimal  interval  algorithm  is  exactly  the  same  as  and  achieves  the  same 
throughput  as  the  optimal  algorithm  for  d  I. 

these  lower  bounds  on  K^>]  and  l^- > '  can  be  lightened  by  removing  the  lestriction  that 
null  requests  can  onlv  be  granted  at  the  beginning  of  every  d  rounds  synchronous  with  the  clock 
interval  ol  period  il .  Instead,  let  null  requests  be  granted  immediately  whenever  they  occtu  as  was 
the  case  in  our  oiigui.il  Immolation  of  the  arbitration  problem.  However,  this  affects  the  piobahtl 
ity  of  requests  is  seen  b\  the  (icstricted)  ai biter  even  arbitration  instant  Ihe  (restricted)  arbiter 
sees  a  null  lequcst  at  an  a'lnu.iOon  instant  if  there  have  been  exactly  7  null  icquests  since  the  last 
aihili.iiioii  instant,  othcr.visc  it  si.es  a  nonnull  request.  I  bus  the  (reside led)  arhitei  sees  a  null 

,  /'(I  (/•n),/) 

request  vu(l>  phih.ihiliiv  (#»u)'  uul  nommll  request  of  leuplh  /  with  pmkibiliiv 
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.  ..  4  .  .  Pill  -iprif*) 

where  p o  =(por  and  />,  - — — - .  \<i<S/l. 

(1  P  o) 


ITiis  lower  bound  is  easily  seen  to  be  tighter  than  the  previous  one  since  - - >0  for 

3U  ~Po) 


every  pn  along  any  ray  in  the  feasible  probability  region. 

Hie  complete  bounds  on  the  optimal  throughput  for  a  deterministic  grant  duration  </>l 
rounds  arc: 


4  '<4  y<4>y^i~x 


or 


]>d  '<',!■  1 


J  P  -  J  P  -  P  -  r 


whcrc  /)0'  (Po)J  and  />, 


,  P,0  (Paf) 


\<i<S/l.  Note  dial  these  bounds  aie  expressed  coni- 


(I  Pol 

pletcly  m  terms  of  the  optimal  throughput  for  </  I  which  is  a  much  simpler  problem  than  for 
</>l. 

1  and  / approach  tlieii  respective  upper  bounds  as  pq~*  I.  litis  can  be  shown  as  fol¬ 
lows.  We  have 


V 


,  «/>!  .  i 

i  w/f/f  i  d 

1  Pa 


Pa  *  (1  Pa)(  wih  '  +  * ) 


.<1  i 


S 


Pa  y  <  I  poXwrh  1  f  J) 


Pa 


I  Pa 


*  "KH  '  f 1 


./  >  I 


laking  the  limit  as  I.  for  which  '-»(•  and  we  have  Inn  1  , 

* 1 


I  and 


,./>  I 

V 


lull  .  (/  llius  l'J>]  *  ('J  1  and  r1-  ^ 1  ~</g'/  '  hn  />o~  I  I  he  estimated  optimal  average 


numhei  of  grants  per  round  for  light  lo.uhng  shown  in  I  igme  I  14  mi  relates  well  with  this  latter 
result.  1 1 1 is  icsult  justifies  the  intuitive  ic.ison  given  in  section  I  4  I  lot  the  strong  leLiionship  ol 


.«/  >  i 


>y  with  J  for  light  loading. 


Similarly.  i;‘/  '* 1  and  i't  ' 1  seem  to  .ippioach  then  iespc<  live  love;  hounds  is  ;  »0  ,i\  sup 


gested  hy  the  lesults  I'm  -0  m  I  igme  '  14  lot  S  4  and  ,/  ?  J  and  4  touiuls  \\ ,  v.cic  un  hie 
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to  prove  this  conjecture. 


3.4. 2.3  Approximating  the  Optimal  Throughput  with  Deterministic  (irjnt  Duration 

If  d.  the  grant  duration,  is  large,  the  number  of  states  required  to  calculate  g '/ >  1  is  very 
large,  making  its  calculation  difTicult.  A  more  attractive  approach  for  Irgo  </  is  to  approximate 
Kp'>]-  1°  this  subsection  we  present  a  simple  approximation  to  i'/?>l  using  4  1  and  the  value  of 
d.  Since  >  1  is  trivially  related  to  \  this  approximation  also  applies  (although  indirectly)  to 


,</>  l 


gf>x 

Ihc  ratio  . -  -  has  the  following  properties  along  a  ray: 


I)  It  is  <i  continuous  function  of  I  p^. 


gJ>' 

2)  lim  P.  .  d 
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3)  lim  -  ■  , 

Pa~* 1  d(  1  />,)> 


d(d  I) 


In  addition  we  comet. lure  that  lim  ,  ,  1. 
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We  choose  to  .ippioMMiatC  ,  by  an  exponent;. il  tun-  l u •  •  i  w  itii  tlu  s..i,k  »  i.M  md 


slope  (  i /(,/  I))  at  />,•)  I,  and  the  same  asymptotu.  value  as  conici  lured  u  ■  n  h  i-  om 

appum. nation  along  any  ray  is 
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1  or  ,/  ^  the  .ippro»imation  overestimate 


to.  .ibmii  />„>.!.  I  or  (I  4  the  approximation 


.eeniN  !<■  .light  iv  linden  sinuate  f  |  tor  .il>(>m  ,<0<  I  <in<1  otherwise  it  slightly  overestimates  UK 


We  Jo  not  know  wh.it  pci  form, ince  u*  expect  of  the  approximation  in  equation  3.18  for 
target  xiIi.cn  ot  S  anJ  ,/  However,  it  in  also  clear  that  the  approximation  is  roughly  correct: 
,J  > 

most  Jeer  case  from  ,/  u<  I  as  I  />0  increases  from  0  to  1.  I  Tic  approximation  has  two  key 
V 

ail v  images  I  trsi  it  in  v, e i  >  simple  to  calculate  Second,  n  reduces  the  determination  of  to 


the  dele miin  ation  ot  a  much  simple!  problem  Hi  is  second  advantage  cannot  he  over¬ 

stated  ii.i  laige  s  n  i  Ji Hk. till  enough  to  determine  g;  '.  as  discussed  later  in  this  chapter,  let 
atom  cr-  Indeed  .n  ihe  lemamdei  ol  this  .haptei  (excluding  tire  no  two  subsections  3.4. 3  and 
i-iti  w,  ,.nU  l i msider  the  Kingbus  with  J  I  and  point  to  equation  3.18  for  treatment  of  arbi- 
tiarv  detenu  iistK  giant  duialions. 
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1  o  mi.-.. i',m  op.imal  iihoii.  1 1 1 •  i it  ol  die  foui  slice  Symmetric  Kingbus  with  geometric 

e-an:  Jm  aiao,  avd  the  mith  state  do  option  as  discusscJ  eailicr  for  a  deterministic  grant 
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’  1 , I  ton,  . i .  m -.ti a.I  of  nuei [noting  </,  (  0  ot  I)  as  the  number  of  rounds  that 

i.si  h.o  'ver  a  ii'tol  we  mte. preted  as  a  boolean  value  indicating  whether  or  not 


dii,  s  ,o,k  ■  w  ,s  g  .,niid  "i  the  preceding  lound  If  a  slice’s  request  was  granted  in  the  prcced- 
mg  r.mnd  then  it  icmmct  gi  nned  in  the  i  uncut  found  with  probability  prbamt  and  it  tcr- 
non.iti  d  •vniidiatiU  t'"oi  to  the  niiient  lound  with  piohahility  I  prbcoM .  Ihus  the  grant  dura- 
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icst  was  ccomctiu  laiulom  vanahlc  with  mean 


I  prlvonl 


s  I,. i  it,  iiior.  . 1 1 v|  i  \|. nr  s  Ivoimjs  we  ('blamed  estimates  of  the  optimal  average 

,rmv  .  -I  . r.  p.oci.is  md  estimates  ol  tlu  optimal  policy  lor  /v/mw*  .001.  .5.  .75.  .0.  l  ig- 
.o,  ;  o  ,  ,w  a  ,  .■ ■  iinn, „ed  .  >i  vi  i .  1 1  a  I  numbii  ,.t  giant,  in  proeicss  pci  tound  for  these  four  cases 
w  o  i,  ■.  ,.  d  .  qo  n.  [>  oh. iiiilitii  s  Ml  these  estimates  are  within  ±.005  of  optimal. 
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Figure  3.16:  Average  number  of  grants  in  progress  per  round 
for  various  geometrically  distributed  grant  durations 
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tins  exclusion  of'  the  set  mill  m.iin  sl.i's  is  llt.d  Ills'  dm  •  icqinKd  until  .11  giants  smik  ill*  ,n  pin 
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.1.4.4  Odict  (•runt  l)u  rut  inn  Distributions 

We  cannot  say  much  about  the  elTcet  of  other  grant  duration  distributions  without  further 
study.  However,  we  can  give  die  following  generalities  about  die  optimum  throughput  with  any 
grant  duration  distribution: 


1)  Along  any  ray  for  which  the  nonnull  probabilities  have  some  fixed  ratio,  topl  is  monotonic 
in  1  -po  everywhere  along  die  ray  and  convex  down  in  1  p o  within  any  subregion  (i.e. 
within  any  region  in  which  one  particular  policy  is  optimal)  along  the  ray.  The  argument  to 
support  these  two  conclusions  is  die  same  as  that  given  in  section  .1.4.2. 1 . 

2)  If  lj}  denotes  die  optimal  throughput  with  some  grant  duration  distribution  with  mean  d 
and  some  request  probabilities  p <y.  p  |,  ■  ■  •  .  ps/i  denoted  by  ~p  and  if  1  denotes  the 
optimal  throughput  with  a  deterministic  grant  duration  of  one  round  and  die  same  request 
probabilities  denoted  by  then 
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, i™* '  i'.  iIk  maximum  km. ml  m  -.(.lie  /  In  most  cases  i"'  is  i  surprisingly  good  estimate  of  (he 
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these  results  .lie  shown  in  l  igme  V 1 7  in  avoid  cluMcriut;  '.lie  liguie).  In  all  ruses,  except  I'oi  lour 
along  the  edge  p  o  0  and  p  -  0  and  one  al  p  o  0.  p\  .100.  />■>  .400.  and  p\  0.  we  found 

Iherc  are  far  too  many  stales  to  determine  and  analyze  the  optimal  decision  regions  as  we 
did  for  four  slices  in  section  3.3.  l  iirthcrmorc.  the  decisions  determined  by  the  value  iteration  do 
not  necessarily  comprise  an  optimal  policy  -  they  only  comprise  an  estimate  of  the  optimal  policy* 
-  so  it  is  best  not  to  examine  them  too  closely.  Thus  we  will  only  discuss  the  main  trends  in  the 
decisions.  Wc  will  also  discuss  the  performance  of  some  rule  of  thumb  policies. 


t  1)1  C  ami  VAX  arc  trademarks  of  the  Digital  l-'quipmcnt  Corporation 

As  discussed  in  sen  ion  3  4.  a  policy  dclcrmincd  via  value  iirralion  is  oplimal  only  in  the  sense  that  the 
throughput  with  lhal  policy  is  within  some  interval,  gi'cn  by  ihe  Odom  hound-,  of  ihc  optimal  throughput. 
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Hie  most  inlcrcsting  trend  in  die  decisions  is  th.il,  unlike  the  case  fur  lour  slices,  it  is  not 
always  hesl  to  grant  the  request  subset  with  the  maximum  number  of  requests  (i  e.  reward!.  I -or 
every  set  of  probabilities  considered,  we  found  at  least  one  state  in  which  the  estimated  optimal 
decision  is  to  grant  some  request  subset  having  less  than  llie  maximum  reward.  I  he  number  of 
stales  with  such  estimated  optimal  decisions  is  small  for  po  large  (i.e.  light  traffic)  and  increases 
rapidly  as  po  decreases  (i.e.  as  traffic  increases).  The  most  rapid  increase  of  the  number  of  these 
states  as  pq  decreases  occurs  for  probabilities  in  die  /» 2  -  p\  plane  -  i.e.  for  p  \  0. 

One  stale  in  which  we  found  the  estimated  optimal  decision  to  grant  less  than  the  maximum 
reward  is  (-2.  3,-1.  I.  I.  1).  The  subset  with  maximum  reward  is  (  0,0.  I.  I.  I.  I).  How¬ 
ever,  for  every  set  of  probabih.ics  we  considered,  the  estimated  optimal  decision  is  to  grant  the 
subset  (  0,  3,  0,  0.  I.  I).  The  request  set  and  these  two  subsets  are  pictured  in  the  diagrams  in  f  ig¬ 
ure  3.18. 


f  igure  3.18:  An  example  of  a  request  set  for  which  the  estimated  optimal 
decision  is  to  grant  less  than  the  maximum  number  of  requests 

Note  dial  the  requests  of  length  2  and  3  con llicl.  lividcntly  this  conflict  reduces  the  value  of  the 
leftover  of  the  maximum  reward  subset  compared  to  die  value  of  the  leftover  of  the  estimated 
optimal  decision  to  a  degree  that  cannot  be  overcome  by  the  larger  reward.  An  additional  factor 
is  that  both  requests  in  the  leftover  of  the  maximum  reward  subset  are  long  requests.  In  heavy 
traffic  long  requests  "cost"  more  than  shoit  requests  to  grant  since  they  involve  blocking  a  request 
from  each  of  the  one  or  two  slices  along  the  route  which  die  long  request  is  granted.  This  factor 
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seems  In  prediMninate  m  the  states  m  whk.lt  the  estimated  option.il  decision  is  in  pi. mi  less  ili.m 
the  maximum  reward  .ill  such  stale  .  have  .1  mu  <•(  long  ,md  slum  uquesis  In  si.n.s  m  wlu<  Ii  ill 
Miiiiiinll  requests  .ire  ol  ihc  .ante  Icnplh  Ihc  ••slnn.ilt'ii  opiun  il  dci  tsion  iU.ivs  to  gi.-m  .me  of 
the  request  subsets  with  maximum  lew.ud  A  st.ite  ( > |>k .1 1  ol  those  m  "huh  :he  estnu  .1  •  iim.iI 
decision  is  to  grant  less  th.m  the  m.imiuuin  teward  m 

(  2.  I.  I.  2.  1.  I). 

Ihc  subset  with  maximum  reward  is  (0.  1.0.  2.  0,  !)  but  for  many  sets  of  pioh.iMuty  the 
estimated  optimal  decision  is  the  subset  (  2.  0.  0.  0.  \ ,  0)  A  more  gl.inug  example  is  the  state 

(  I.  1.3.  I.  I.  .I). 

Ibe  subset  with  maximum  reward  is  (  I.  1.0.0.  I.  1.0).  yet  (0.0.  1.0  0  I)  is  often 
estimated  as  the  best  subset.  Note  that  the  leftovers  Cm  both  these  subsets  are  immediately  grant 
able. 

I  n  determine  the  stgnilicanec  ol  the  last  that  the  estimated  optimal  derision  often  indicates 
that  the  request  subset  with  maximum  reward  is  not  the  best  to  giant,  we  modified  our  value  itera¬ 
tion  program  to  Uni!  the  optimal  throughput  of  the  Ringbus  with  the  additional  const; aim  th  it  the 
request  subset  granted  in  each  stale  must  have  the  maximum  toward  possible  lor  that  state.  We 
call  this  the  maximum  reward  constraint.  ligorc  1.1*)  shows  the  optimal  throughput  (to  wiihtn 
±.005)  of  the  Ringbus  with  this  consliaint  for  selected  probabilities.  I  he  amount  by  which  this 
throughput  is  less  than  that  without  the  maximum  reward  constraint  for  a  paiticular  set  of  proba¬ 
bilities  is  indicated  (to  within  I  decimal  places)  by  the  quantity  in  the  buckets. 

l  or  most  of  the  sets  of  probabilities  investigated,  and  especially  for  light  traffic  (i.c.  />o  large), 
the  optimal  throughput  of  the  Ringbus  is  not  significantly  affected  by  the  maximum  reward  con¬ 
straint.  I  hc  most  significant  reduction  caused  by  this  constraint  occurs  mostly  on  the  face  />?  0 
and  near  the  face  p 0  1  for  p\  large.  Another  way  to  describe  this  region  is  that  /»<>  is  rather 
small,  p |  is  large,  and  pi  is  very  small.  In  other  words,  traffic  is  fairly  heavy  and  there  is  mainly 
short  and  long  requests.  Of  the  probability  sets  considered,  the  largest  reduction  in  throughput  - 
at  least  .057  -  occurred  at  p\  -  .4,  p->  0.  and  /ij  .2.  I  hc  fact  that  the  maximum  reward  strategy 
is  not  optimal  in  this  region  with  heavy  traffic  and  mainly  short  and  long  requests  is  easy  to  see. 


t  ITic  quantity  in  Ihc  brackets  is  actually  /  -  where  (/nw  js  ihc  upper  bound  or.  the  optimal 

throughput  with  the  maximum  reward  consliaint  and  /  i'  the  lower  bound  on  the  optimal  throuplipul  without 
this  constraint  Ihus  the  actual  diffctciicc  in  throughputs  exceeds  that  indicated  Nothing  is  indicated  inside  the 
brackets  if  /.  . 


'  Optimal  throughput  with 
rn.uimum  reward  constraint 

Amount  this  throughput 
a  lest  than  the 


umunuamed  optimal 


throughput  (see  footnote 
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KoqucM  subsets  m ii h  only  slum  icqm-Ms  mill  usually  have  .1  laigci  toward  than  subvis  with 
.1  long  icqucsl  so  .1  Ini'.',  u  quest  will  tend  in  teutaiu  iiip.iiikd  lut  .1  lung  time  (until  must  nl  the 
othet  leguests  lie  .»Im>  lung)  I  lie  slue  wlikli  submitted  the  long  legnesl  is  piete'ited  Ihmii  sub 
milling  ftii liter  K-gucsls  I  hits  .1  six  sine  It  tophus  .mild  lie  sHectml)  udikeJ  In  five  npe'almg 
slices  I  Ins  e'lTcxl  e.m  t*e  i*n|m\mIK  severe  it  />o30  .md  the  long  request  is  very  rate  eomp.ifcd  In 
lhe‘  slum  icqucMs  In  this  case  the  long  1  eg  nest  mill  reiii.nn  ungi.  tilled  Im  1  vci>  long  time  ,md 
mill  pieient  ni.iny  poleiili.il  slum  reguests  Imui  Us  origin. iling  sliee  Irom  being  genet. ited  .md  tun- 

/» 1  P  1 

liibuting  to  the  throughput  I  nr  /»oS!U.  »  5  /< '  II.  .md  ~0  the  optim.il  dceision  is 

I  Pa  1  P(  1 

t  .  P\ 

obviously  lo  gr.int  ,m>  long  leguests  fust  On  the  otliei  h.uid.  lot  /»n~  I  ~  *>.  />  i  0.  .md 

I  Pa 

/>,* 0.  ,iii>  long  leguest  sail  just  m.iit  (.1  short  time  oil  .ivcr.ige)  11. ml  tlieie  .ire  no  utliei  leguests  to 
eonlhtt  mull  the  long  leegiest  .md  tint-,  the  oplim.il  decision  is  to  gi.mt  the  m.ixmmin  number  of 
reg  nests. 

An  examination  of  the  estun.iied  upnm.il  decisions  rese.ileil  1h.1t  the  request  subset  chosen 
often  utilized  the  maximum  number  of  segments.  Ibis  tendency  seemed  p.nticiil.nly  strong  in 
those  sl.ues  for  which  the  cstininted  optim.il  decision  wns  not  the  m.txumnn  reward  request  stihsei. 
lo  est.ihhsh  the  merit  ni  the  m.ixunum  number  of  segments  slr.tiegy.  me  modilled  out  value  itera¬ 
tion  piogiatn  to  find  tlo-  ojt.mal  throughput  of  our  Kinglnis  nmdOl  with  the  .icfdition.il  constiaint 
that  the  ret|iiest  subset  granted  in  each  state  must  utilize  the  maximum  number  of  segments  possi¬ 
ble  for  dial  state.  In  computing  the  number  of  segments  a  request  subset  requires,  wo  use  the 
number  of  segments  that  each  request  mould  tequirc  if  it  were  granted  in  the  shortest  direction 
around  the  Ktngbus.  I  bus  the  number  of  segments  that  a  request  subset  utilizes  is  equal  to  the 
sum  of  the  request  lengths  for  those  requests  granted,  figure  3.20  shows  the  optimal  thioughput 
(to  within  ±.005)  of  the  Kingbus  with  the  maximum  number  of  segments  constraint  for  various 
probabilities.  As  for  figure  ±19,  the  amount  that  this  throughput  is  less  than  the  unconstrained 
optimal  throughput  (displayed  in  f  igure  .VI 7)  for  a  particular  set  of  probabilities  is  indicated  (lo 
within  3  decimal  points)  by  the  quantity  in  the  brackets.* 

if.  I  tic  throughput  listed  beside  tin  point  /)  |  .5  /M  />  (  :  0  in  ligiiics  Vi7  and  '  14  is  actually  for  the  point 

l>  |  .498.  p  >  p  1  .001  (  Ml  zero  pinhabililics  were  replaced  with  ecu  small  probabilities  so  that  the  same 

stale  space  and  same  program  could  be  used  lo  cilculalc  all  throuj-hpuls  foi  sis  slices  without  possible  pioblcms 
cruised  In  noncomimiiiicatmg  stales  All  slaies  comiuimie.ile  if/),  >0  lor  !  <C.S/  2)  ibis  accounts  for  Ihc 

apparent  cortlmdiclion  belween  on:  earlier  ohservalion  lh.il  lire  es'imaled  oplim.il  decision  is  lo  grant  lire  max¬ 
imum  reward  reoucsl  sulisel  in  stales  with  all  iminmll  retpaesis  of  the  same  length  and  ihc  fact  that  Ihc 
throughput  listed  in  I  igtne  I  14  for  the  point  P\  .5.  P2  p  |  0  with  lire  maximum  reward  constraint  is  less 

lhan  optimal  A  separate  analysis  conrmned  that  exactly  at  ihe  point  p  \  .5.  p 2  p  t  0.  ihc  optimal  policy 

grants  llie  maximum  reward  subset  m  cseiy  stale  (Of  course,  exactly  at  Ihe  point  /.’ |  -  .5.  P2  P i  '  0.  all 
slates  have  all  icipicsls  of  Ihe  same  length.) 

t  Ihc  quantity  in  Ihc  brackets  is  actually  /.  (z  ,  where  U  '  is  Ihe  upper  hound  on  Ihe  optimal 
throughput  wiili  Ihe  maximum  .c.i'icnt  eoiislrauu  and  /  is  ihc  lower  bound  on  ihe  optimal  throughput  without 
Ibis  constraint  'I tins  (lie  actual  dilf.-i.'iice  111  ihunighpuis  exceeds  thru  auli.nled  Nothing  is  indicated  inside  Ihc 
brackets  if  l.<U"m 


Optimal  throughput  with 
maximum  number  of 
segments  constraint 


Amount  this  throughput 


is  less  than  the 


uncon  trained  optimal 


throughput  (see  footnote 
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l  or  .ill  ihe  sols  of  probabilities  investigated.  the  maximum  number  of  segments  constraint 
only  caused  a  notable  reduction  in  throughput  for  />(  large.  />'*().  and  p  \  small  to  medium  large. 
I  his  is  the  vune  region  (hea\y  traffic,  mainly  short  and  long  requests)  lor  which  die  maximum 
reward  constraint  caused  the  most  significant  reduction  in  throughput.  However,  the  maximum 
number  of  segments  constiami  never  caused  as  large  a  reduction  in  throughput  as  the  maximum 
reward  constraint.  In  fact,  the  reduction  in  diroughput  with  the  maximum  number  of  segments 
constraint  provides  an  efficient  compromise  between  the  conflicting  desires  to  grant  the  maximum 
number  of  requests  in  a  state  and  minimize  liic  wailing  time  of  long  requests. 

One  might  conjecture  that  the  optimal  policy  grants  llie  request  subset  in  each  stale  with 
either  the  maximum  reward  or  die  maximum  number  of  segments.  However,  this  conjecture 
seems  to  he  false  in  general.  It  is  indeed  true  that  for  most  suites  and  for  most  probabilities,  the 
estimated  optimal  decisions  correspond  to  either  the  maximum  reward  or  the  maximum  number  of 
segments  (or  both)  request  subsets.  As  po  decreases  and  p\  increases,  the  number  of  states  in 
which  the  estimated  optimal  decision  corresponds  to  neither  maximum  reward  or  maximum 
number  of  segments  increases,  but  it  never  exceeds  about  ‘X)  suites.  Two  typical  states  in  which 
die  estimated  optimum  decision  is  often  neither  the  maximum  reward  nor  maximum  number  of 
segments  subsets  are  (- 2,  2.  3,  1.2,  1)  and  ( -2.  3.  -  1.  1.  1.  -1).  l  or  die  former  state,  the 
request  subsets  (  0.  0,  3.  -1,0,  I)  and  (  0,  2,  0,  1.0,  I)  achieve  the  maximum  reward  and  the 
request  subset  (  0,  0,  3,-  I.  0.  1)  uniquely  achieves  the  maximum  number  of  segments,  l  or  the 
latter  state,  the  request  subsets  (  0.  3.-  1,  1.  0,  0)  and  (  2.  0.  1.  1.  0.  0)  achieve  die  maximum 

reward  and  the  request  subset  (  0.  3.-1.  I.  0.  0)  uniquely  achieves  the  maximum  number  of  seg¬ 
ments.  However,  the  estimated  optimum  decision  in  these  two  states  is  often  (-2.0,  0,0,  2.0) 
and  (  0,  3.  0.  0,  1,  0)  respectively,  f  igure  3.21  depicts  diagrams  of  diese  various  possible  decisions 
in  die  two  suites. 
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I  he  request  subsets  (  2.  0,  0,  0.  2.  0)  and  (  II.  V  0.  0.  I  0)  .illuw  .it  Ic.isi  '  icqncss  to  l*c 

punted  in  the  round  following  the  st.ites  (  2.  2.  3.  I.  2,  I)  ..nd  t  2  I.  I  t  II  icspet 

U\el>.  All  the  other  request  subsets  allow  ,ii  le.ist  2  tequests  I  bus  depending  on  the  pioh.ihihlics 

of  the  various  requests,  the  cflcu  of  not  graining  the  maximum  numbei  o!  tequests  in  the  subset 
(  2.  0.  0.  0.  2.  0)  and  (  0.  2.  0.  0.  I.  0)  e.m  be  compensated  to  some  deprcc  b>  possibly  pr.inunp 

more  requests  in  the  following  round.  Of  course,  the  relative  values  i,  \|  of  Howard  s  policy 

iteration  algorithm  (which  can  also  be  estimated  by  I  ;(n  )  I  (( /; )  in  the  value  iiciulion  algonthm) 
give  the  ex. ict  degree  to  which  one  request  subset  is  preferable  over  another. 

A  possible  rule  of  thumb  for  the  decision  in  each  state  so  as  to  achieve  near-optimal 
throughput  of  the  Kinghus  is  t  >  grant  some  request  subset  utilizing  the  maximum  number  of  seg¬ 
ments.  As  we  discussed  earlier,  the  maximum  number  of  segments  constraint  only  slighilv  affects 
the  optimal  throughput.  A  more  precise  rule  of  thumb  policy  that  we  investigated  is  the  following. 
In  each  state  grant  some  request  subset  that: 

1.  utilizes  the  maximum  number  of  segments. 

2.  has  the  maximum  number  of  requests  subject  to  I.  and 

3.  has  the  maximum  number  of  the  longest  requests  subject  to  I  and  2  (i.e.  a  request  subset  w  ith 
requests  of  length  !.  2,  and  I  is  preferable  to  one  with  requests  of  length  2.  2.  and  2). 

Constraint  2  selves  mainly  to  leducc  the  number  of  eligible  request  subsets  in  each  state 
while  keeping  die  reward  large.  Constraint  3  ensures  that  long  requests  arc  granted  before  shorter 
ones  (for  subsets  meeting  constraints  l  and  2). 

We  investigated  this  rule  of  thumb  policy  by  determining  the  estimated  optimal  throughput 
subject  to  these  three  constraints  for  the  91  sets  of  probabilities  with  />p  pi ,  and  \  some  integral 
multiple  of  .1.  (We  used  these  same  sets  of  probabilities  whenever  wc  calculated  the  throughput 
for  any  variation  of  the  Kinghus  model  with  six  slices),  l-'or  every  set  of  probabilities  considered, 
the  estimated  optimal  throughput  with  these  constraints  was  close  (within  --.009)  to  the 
estimated  optimal  throughput  with  just  the  maximum  number  of  segments  constraint,  further¬ 
more,  in  the  vast  majority  of  states  there  is  only  one  request  subset  that  meets  constraints  1.  2.  and 
3.  Thus,  these  constraints  function  well  in  reducing  the  number  of  possible  decisions  in  each  suite 
without  affecting  the  throughput  by  much.  Quite  a  few  states  remain,  however,  for  which  there  is 
still  more  than  one  request  subset  meeting  the  three  constraints.  An  examination  of  these  states 
revealed  dial  for  most  states  diese  remaining  subsets  are  either  related  by  symmetry  or  nearly 
identical.  Wc  believe  that  the  throughput  would  remain  essentially  the  same  if  for  each  slate,  die 
request  subset  is  selected  arbitrarily  from  those  meeting  all  three  constraints,  l-'or  that  matter,  wc 
suspect  that  die  throughput  would  remain  approximately  the  same  if  for  each  state  die  request 
subset  is  selected  arbitrarily  from  all  those  meeting  die  maximum  number  of  segments  constraint. 
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VS  !  lioumlv  ini  the  Optimal  I  h 

\ke  now  dr^  tiNs  thiec  dillciciil  hi  inch  (.ill  1  *|>|*cr  hounds)  mi  the  optimal  lltioiigltpul  m  the 
kinghus.  (  a  veals  .ire: 

1)  the  Kinghus  has  s>mmclnc.il  .news  paths  i.e  it  is  a  Symmetric  Kinghus. 

2)  all  slices  have  identical  request  probabilities  and  geometrically  distributed  processing  times  (as 
.Lssuined  in  section  .VI).  and 

3)  the  duration  of  all  grants  is  a  single  round. 

All  ol'  die  bounds  can  be  extended  to  deal  remove  these  restrictions.  However,  all  these  exten¬ 
sions  (except  Irmn  a  symmetric  to  a  non-symmctric  kinghus),  complicate  the  calculation  of  the 
bounds  and  thus  makes  the  I  rounds  less  attractive. 


.3.5.1. 1  Hon  Model  Hound 

I X'note  the  rate  at  which  requests  -  null  and  nonnull  -  arrive  at  the  kinghus  (in  number  of 
requests  per  round)  from  slice  i  by  X,.  because  of  our  symmetry  assumptions.  X,  is  die  same  for 
all  slices,  dms  we  simoly  denote  the  rate  by  X.  Hie  rate  at  which  nonnull  requests  arrive  at  the 
Kinghus  from  a  slice  is  (I  -  />0)X.  Therefore  the  throughput  of  die  kinghus  is  .S’(l  />0)X  where  S 
is  the  number  of  slices. 

We  now  consider  die  rate  at  which  requests  are  granted  from  a  slice  for  various  destinations, 
t  his  rale  may  be  likened  to  a  How:  nonnull  requests  flow  into  the  kinghus  from  one  slice  at  the 
rate  (1  /;o)X.  The  How  from  a  slice  to  a  destination  i  segments  away  is  />,X.^  We  assume  that  all 

requests  of  length  0</<.V/2  arc  granted  in  the  shortest  direction  and  that  requests  of  length 
S/2  arc  granted  in  die  clockwise  direction.  Thus  this  flow  divides  in  accordance  with  the  clock¬ 
wise  or  counterclockwise  position  of  the  destination  relative  to  the  source. 

s/  2  j  p.  s/  2 

The  total  clockwise  flow  over  a  particular  segment  is  (1  - po)\  2  — ---•  -  X  2 '  Pi-  Simi- 

,  |(l-/»0>  1 

larly  the  total  counterclockwise  How  over  a  particular  segment  is 
S/  2  I  /  pj  .S'/  2  I 

(l-/»o)X  2  -  — — -  X  2  'Pi-  Thus  tlic  total  flow  over  a  particular  segment  is 

,-i  0  Po)  /  -  l 


/i  .vf's/'v'1  2ip>  s/1Ps/l 

( 1  -  Po)*-  2*  fl  T  +  ,  TT~ 

, I  (l-Po)  (l-Po) 


(i  -  /»0)xr 


where  /  is  die  average  length  of  a  request  (in  terms  of  the  number  of  hops  or  segments  required) 

t  The  probability  lh.it  a  request  is  for  a  destination  i  seemeuts  awa\  from  the  mhiicc.  given  that  the  request  is 

Pi  „  /'t  „  '  u  . 

nonnull.  is  — -  .  yielding  a  flow  of  (I  p  p/A  pt  A 

(1  Pa)  0  Pa) 
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given  that  Ihc  request  is  mmmill.  Ily  symmetry  arguments.  the  flow  is  identical  on  all  segments. 

Ihc  total  flow  on  any  segment  must  not  exceed  I  (i.c.  one  grant  per  round).  Thus 
(1  i>n)\I  <  1  and  therefore 

,°pi  <  4. 

~  / 

'This  hound  is  best  for  heavy  traffic  -  i.c.  pq~0  -  but  even  then  it  is  not  that  good. 


$ 

$ 

I 

s> 

& 


I  he  ill  rough  put  of  the  Kinghuscan  be  written  as 


/  - 


S 


Po 

—  —  w  /  I 

I  Po 


where  is  the  average  processing  time  (in  rounds)  and  h’  is  the  average  waiting  time  of  a 

'*  -  Po 

request  (again  in  rounds).  Since  *v>0,  we  have 


/<V(l-p0). 


yielding  a  tighter  hound  for  light  traffic,  i.c.  p<)~l.  Ihus 


s 

v  J 


I°r'<s  min 


y  .  (I  -/'(.) 


(.1.19) 


The  effects  of  segment  ,md/or  destination  conflicts  must  he  included  to  get  more  useful  hounds. 


3.5. 1.2  Crossbar  Hound 

An  alternative  way  to  obtain  an  upper  bound  on  the  throughput  of  the  Kmghus  is  to  i.>n 
sidcr  a  simpler  model.  One  such  simpler  model  is  to  consider  only  the  destination  of  ,i  icqocM  a 
other  words,  ignore  the  segments  that  a  request  requires.  Tor  S  slues,  single  unit'd  gum.  1 
lions,  and  ignoring  request  waiting  times,  the  state  devnption  of  such  a  model  is 

(r  \.r\  •  .r, ) 

where  r,  is  the  destination  (I,  2.  ■  .  or  S  )  of  the  u. quest  it  sIm  v  u,  • 

null  request  at  slice  i.  Alternatively  the  dcsim.tii.m  m  u  hi  is 

slices  the  destination  slice  is  aiouiuJ  th  V  'n  hus  i 

r,  (  V /  2  1).  1.  0.  I.  o,  S  a'-.  •,  •  ,  . 

wise  direction  and  a  posiioi  q.im'it,  i.u  -  ‘  - 

request,  wh  u  is  no;  ...ns 

rcsouu  es  Sou  e  n  h  .  •' 
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on  be  only  destination  conflicts,  this  simpler  model  can  be  viewed  as  .S'X.V  nondiagonal  crossbar 
interconnection,  (Nondiagonal  means  that  there  are  no  crosspoint  switches  along  the  major  diago¬ 
nal.)  Iliis  crossbar  model  has  die  same  state  description  as  die  Ringbus  model  discussed  in  the 
beginning  of  section  3.5.  The  only  difference  between  the  two  models  is  in  die  constraints.  'ITic 
Ringbus  model  has  segment  and  destination  requirements  for  each  request  and  the  crossbar  model 
has  only  destination  requirements.  Ihus  the  crossbar  model  has  fewer  constraints  on  which 
requests  may  be  granted  simultaneously  i.c.  it  has  more  immediately  grantablc  request  sets  and 
fewer  request  conflicts. 

'Ilicrcforc  merely  by  changing  what  constitutes  a  grantablc  request  subset  (a  request  subset 
in  which  all  requests  arc  grantablc).  die  same  computer  program  can  be  used  to  determine  die 
optimal  diroughpul  for  both  the  Ringbus  and  crossbar  models,  f  igure  3.22  shows  the  optimal 
throughput  for  selected  probabilities  for  the  Ringbus  and  crossbar. 

The  optimal  throughput  of  the  Ringbus  is  close  to  that  for  the  crossbar  when  pq  is  large  (i.c. 
light  loading)  and  when  p\  is  large,  for  most  other  probability  sets,  and  especially  for  large  py 
the  throughput  of  die  crossbar  exceeds  that  of  die  Ringbus  by  a  great  deal.  This  is  to  be  expected 
since  the  crossbar  docs  not  have  any  of  the  segment  conflicts  which  comprise  the  majority  of  the 
conflicts  in  the  Ringbus. 

The  chief  value  of  the  crossbar  bound  is  to  allow  a  comparison  between  die  performance  of 
the  Ringbus  interconnection  scheme  and  that  of  a  crossbar  interconnection,  which  has  the  best 
performance  achievable.  The  crossbar  bound  is,  of  course,  a  bound  on  the  optimal  diroughput  of 
die  Ringbus,  but  it  is  as  difficult  to  compute  as  die  optimal  throughput  of  the  Ringbus  itself  (since 
both  die  Ringbus  and  crossbar  models  have  die  same  large  state  space). 


t  Where  Ihc  interconnection  must  be  circuit-switched  with  S  sources  and  .V  destinations. 
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3.5. 1. 3  Number  of  Segments  Hound 

Another  simple  model  of  the  Kingbus  is  to  consider  only  the  segments  required  by  each 
request  and  ignore  the  destination  of  each  request.  This  model  captures  die  essence  of  the 
Kingbus  better  than  the  crossbar  model  but  it  still  has  the  same  large  state  space  and  thus  is  use¬ 
less  for  obtaining  a  practical  bound.  In  order  to  reduce  the  size  of  the  suite  space  we  consider  an 
even  simpler  model  of  the  Kingbus.  Now  we  consider  only  the  number  of  segments  required  by 
each  request  and  ignore  the  particular  segments  and  destination  required  by  each  request.  For  S 
slices,  single  round  grant  durations,  and  ignoring  request  waiting  times,  the  suite  description  of  this 
model  reduces  to 


Ohqmi  v/  2> 

where  /»o  *s  die  number  of  null  requests,  »>,.  for  l</<.V/2,  is  llie  number  of  requests  requiring 

•VA2 

i  segments,  0  <///,<. V,  for  0</<.V/2.  and  2 --  s  •  l,1c  only  constraint  on  granting  requests 

1  =  0 

is  dial  die  total  number  of  segments  required  by  die  requests  not  exceed  the  number  of  segments 

.v/2 

S.  Ihusastatc  is  immediately  grun table  if  2'  <.V .  The  total  number  of  suites  is 

i  =  t 

For  S  -  6  diis  mixlcl  has  84  states  as  compared  to  the  4003  suites  of  the  original  Kingbus  model 
(after  symmetry  is  removed). 

Figure  3.23  shows  the  optimal  throughput  of  this  model,  which  we  call  die  number  of  seg¬ 
ments  model,  and  the  optimal  throughput  of  die  Kingbus  for  various  request  pi  inabilities.  Hie 
number  of  segments  model  yields  an  excellent  upper  bound  on  the  optimal  throughput  for  light 
traffic  (i.c.  pq  large)  and  for  />}>. 8.  The  quality  of  the  bound  degrades  as  p2  and  especially  as  p\ 
increases.  This  performance  is  to  be  expected  since  the  number  of  segments  model  ignores  desti¬ 
nation  conflicts  and  the  particular  segments  required  by  each  request.  These  two  factors  dominate 
the  performance  of  the  Kingbus  for  heavy  traffic  and  short  request  lengths.  I  he  bound  is  worst 
for  p i-.5,  P2  ~Pi- -0.  At  this  point,  the  number  of  segments  model  gives  a  bound  of  6.0  on  the 
optimal  throughput  whereas  the  optimal  dirouglipul  of  the  Kingbus  at  this  point  is  4.22  (grants 
per  round). 
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An  examination  of  Ihc  estimated  optimal  decisions  in  each  state  of  the  number  of  segments 
model  revealed  die  same  general  trend  as  those  in  the  Ringbus  model:  request  subsets  with  long 
requests  (i.c.  requests  requiring  many  segments)  were  increasingly  favoured  over  ones  with  only 
shorter  requests,  as  the  traffic  increased  (i.c.  as  ^'s  trend  was  most  pronounced  when  p\ 

was  large,  pi= 0.  and  p  j  small. 

We  computed  the  optimal  throughput  of  the  number  of  segments  model  subject  to  the  two 
different  constraints  investigated  earlier  for  the  Ringbus  model:  the  maximum  reward  and  max¬ 
imum  number  of  segments  constraints.  Our  findings  again  parallel  that  discussed  earlier  for  the 
Ringbus  model.  llic  optimal  throughput  with  the  maximum  number  of  segments  constraint  was 
indistinguishable  (within  the  ±.005  tolerance  range  on  the  optimum  from  the  value  iteration  algo¬ 
rithm)  from  the  unconstrained  optimal  throughput.  The  optimal  throughput  with  the  maximum 
reward  constraint  was  less  than  the  unconstrained  optimal  throughput  in  about  the  same  region  for 
which  the  optimal  throughput  of  the  Ringbus  model  with  the  maximum  reward  constraint  was  less 
than  die  unconstrained  optimal  throughput  of  the  Ringbus  model.  (See  Figure  3.19  for  this  latter 
region.) 

3.5. 1.4  Discussion 

'ITicrc  is  usually  a  tradeoff  between  die  tightness  of  a  bound  and  the  case  or  its  calculation, 
l  ight  bounds  tend  to  be  complex  and  difficult  to  calculate  while  loose  bounds  tend  to  be  simple 
and  easy  to  calculate.  Unfortunately,  die  Ringbus  model  is  very  complex  as  evidenced  by  its  large 
number  of  states.  This  suggests  diat  any  really  light  bounds  on  tl;e  throughput  of  die  Ringbus  in 
all  eases  will  also  be  very  complex  and  difficult  to  calculate. 

'Hie  bounds  we  investigated  arc  examples  of  the  spectrum  of  die  tradeoff  between  tightness 
of  a  bound  and  its  ease  of  calculation,  llic  average  number  of  segments  bound  is  simple  but  not 
very  accurate.  The  crossbar  bound  is  extremely  difficult  to  calculate  (as  difficult  as  die  optimum 
Ringbus  throughput  itself)-  The  main  purpose  of  the  crossbar  bound  is  to  provide  the  perfor¬ 
mance  of  die  best  possible  interconnection  network  for  comparison  with  the  performance  of  the 
Ringbus.  llic  number  of  segments  bound  is  the  best  of  the  three  different  bounds  investigated, 
except  when  p\  is  large,  in  which  case  it  is  the  worst  bound. 
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The  number  of  segments  hound  has  a  further  significant  advantage  over  the  other  bounds:  it 
yields  some  idea  of  the  optimal  decisions  in  the  Ringbus  model. 
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3.6  Optimal  Arbiter  for  Kiglit  Slices 

In  this  ease  (he  suite  description  with  grant  durations  of  one  round  is 

(f|,r2,rj. - rs) 

where  rt  =  —  3.  -2.  -I,  0.  1,  2.  3.  or  4  as  discussed  in  section  3.2.  'Iliis  yields  8* -  16,777.216 
suites.  By  utilizing  rotational  and  flip  symmetry  in  the  slate  description,  the  number  of  states  can 
be  reduced  by  a  factor  of  less  than  16^,  which  still  yields  over  1.000,000  suites.  Needless  to  say, 
this  huge  number  of  states  makes  the  pursuit  of  the  optimum  throughput  and  corresponding 
optimum  policy  very  diflicull  for  general  request  probabilities.  Based  on  our  experience  with  the 
value  iteration  algorithm  for  determining  (lie  optimum  throughput  with  six  slices,  we  concluded 
that  such  an  algorithm  would  be  impractical  for  eight  slices  with  the  computational  resources  avail¬ 
able  to  us.  I  he  optimum  throughput  can  still  be  determined  rather  easily  for  some  special  eases 
with  a  small  number  of  suites. 

One  special  ease  that  we  investigated  is  the  optimum  throughput  along  the  axes  of  the  feasi¬ 
ble  probability  region  (i.c.  only  one  request  probability  nonzero),  figure  3.24  shows  the  optimum 
throughput  along  each  axis  of  the  feasible  probability  region.  Another  special  ease  is  the  optimum 
throughput  on  a  face  of  the  feasible  probability  region  (i.e.  with  only  two  request  probabilities 
nonzero).  We  did  not  investigate  this  ease. 

Bounds  and  approximations  are  die  only  practical  methods  to  obuiin  some  idea  of  die 
optimum  throughput  for  general  request  probabilities.  However,  some  idea  of  die  general  charac¬ 
teristics  of  die  throughput  is  also  useful.  We  discuss  such  characteristics  in  section  3.6.1.  Any  of  the 
bounds  discussed  in  section  3.5.1  can  be  applied,  aldiough  die  Markovian  decision  formulation 
bounds  and  the  crossbar  bound  arc  not  very  practical  due  to  their  large  computational  require¬ 
ments.  We  examine  the  number  of  segments  bound  in  section  3.6.2.  One  simple  approximation  is 
to  replace  all  nonnull  requests  by  requests  of  a  single  length  closest  to  the  mean  request  length 

2/z  |  +  4/>  2 ' 6/>  j  z4p4 

(given  that  a  nonnull  request  occurs)  /  — - - - .  Another  approximation  is 

(1-po) 

lopl  ~  ^/>  i jQj nax.  We  expect  this  to  be  an  excellent  approximation  again  but  it  is  rather  difficult 
i 

to  calculate.  The  difficulty  is  in  determining  q™'  in  each  of  the  8s  slates;  is  trivial  to  deter¬ 
mine. 


+  At  best.  Ih  stales  -  corresponding  to  8  rotations  and  }  Hips  ■  can  be  reduced  to  one  slate  Ibis  reduction  fac¬ 
tor  can  only  be  attained  lor  certain  states  with  /eio  nonnull  tequests  and  zero  requests  of  length  4 


Figure  3.24:  Optimal  throughput  of  llinghus  with  eight  slices  and 
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3.6.1  General  Characteristics  of  the  Optimum  Ihroughput 

The  optimum  throughput  is  a  function  of  the  request  probabilities  i.c.  iop'(p\,p\PU'4)-  In 
lilts  section  we  consider  the  general  shape  of  litis  function. 


3.6.1. 1  Slope  for  Very  Light  Traffic 

From  equation  3.10  we  have  lopl ~^p \k(vkp' ~  v°\p' )■  l"<>r  very  light  traffic,  i.c.  po~L 

* 

vkP>  ~  v°\pt  ~nk  where  iik  is  the  number  of  nonnull  requests  in  stale  k.  'This  can  be  seen  from 
equation  3.13: 


00 


vr-vr  2  Zu>?;,  p>j)'Z<pup,jl(»<)«rp'. 

m  =0  j  I 


For  /t0~l.  Pkf~ 


1  if  j  is  a  leftover  of  slate  k 
0  otherwise  an^ 


1  j  =  1 

0  jj.  j  (where  suite  1  is  the  suite  with 


all  null  requests).  Of  course  q°pl  -0.  Ihus 

4p<  vr*qr+‘p°p,+«r+  •••-«* 


where  suite  /  is  the  leftover  of  state  k,  suite  m  is  tlie  leftover  of  state  /.  etc.  until  tire  leftover  is 
suite  l  with  all  null  requests.  'ITicrcforc  for  />o~l  wc  have 


l0,,,^Plk>‘k 

k 


Now  if  pj  -6  for  some  l</<.V/2  where  8  is  very  small  and  positive  and  /;y  -0  for  all  j*f) 
and  y*/',  then 


P\k 


(26)"*  (1  -23)*^ 


g  ^  opt 

Therefore  fopl  «2.S'8  and  thus  —  —  ~ 2.V .  Taking  the  limit  as  3-+0.  wc  have - 1  2. S'  for 

dpi  dp,  Pu  ~ 1 


\<i<S/2. 

If  p.v/2  -5  where  6  is  very  small  and  positive  and  pj- 0  for  all  j* 0  and  j*i,  then 


P\k 


j ) , opt 

Ihercforc  tcpl  ~.S'8(1  -  8r  ~~  ~S  8  and  tlius  - - —  ~S.  'Taking  the  limit  as  S-*0,  wc  have 

ops/2 

f"H  ,-.v. 

OPS/2  Po=> 
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Note  th.it  these  slopes  arc  reflected  in  the  drawings  in  Figure  3.24. 

3.6.1.2  Shape  Along  a  Ray  with  Fixed  Ratio  of  Nonnull  Probabilities 

For  any  arbitrary  value  of  S  die  characteristics  of  the  shape  along  a  ray  arc  similar  to  those 
discussed  in  section  3.4. 1.2  for  four  slices. 

3.6.1.3  Maximum  Points 

At  any  point  in  die  feasible  probability  region,  the  throughput  increases  if  p\  increases  by 
some  positive  amount  S.  (  Ihis  may  require  that  die  probability  of  other  request  lengths  decrease.) 
'Thus  there  arc  no  maxima  in  the  interior  of  the  feasible  probability  region;  the  maximum  must 
occur  on  the  boundary.  Obviously,  the  unique  maximum  occurs  at  P\~.S  and  the  unique 
minimum  exxurs  at  po=  1.0. 

3.6. 1.4  Shape  Along  Cross  Sections 

The  diroughput  increases  monotonically  along  any  cross  section  parallel  to  the  p\  axis  since 
3/ 

— — >0-  (/  is  the  throughput.)  Along  other  cross  sections,  such  as  parallel  to  the  /14  axis,  the 
op  1 

throughput  may  both  increase  and  decrease.  (For  example,  in  Figure  3.17  die  diroughput 
decreases  as  p j  increases  for  p\~2  and  pi=.\.) 

3.6.2  Number  of  Segments  Hound 

To  obtain  some  idea  of  the  optimum  throughput  of  die  Ringbus  model  with  S  =8  and  grant 

durations  of  one  round  for  general  request  probabilities,  we  calculated  the  optimum  throughput  of 

the  number  of  segments  model  for  S  —  8  with  selected  request  probabilities.  Table  3.3  lists  the 

results,  which  we  obtained  via  value  iteration,  to  within  ±.005  of  optimum.  For  comparison. 

Table  3.3  also  lists  the  optimum  diroughput  of  die  Ringbus  model  for  the  request  probabilities  in 

Table  3.3  for  which  it  is  known.  Ihcsc  request  probabilities  (for  which  l01’1  is  known)  all 

correspond  to  points  along  the  axes  of  the  feasible  probability  region.  Note  that  inumbcr  °f  ■wKn't,’'s 

is  a  poor  bound  for  1 0,11  for  large  p\,  as  observed  for  S  -6  in  section  3.5. 1.4.  Otherwise,  we  expect 
that  tnumberof*‘ vmi-nts  js  a  rcasonab|c  bound  for  ippl ,  as  observed  for  S  =6. 
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Request  Probabilities 

Number  of  Segments  Model 

Ringbus  Model 

P2 

_R3 

P  4 

I  number  of  xegincnb 

lnpi 

0.2 

0.0 

0.0 

o.(T~ 

3.20  ~ 

2.96 

0.4 

0.0 

0.0 

0.0 

6.40 

4.94 

0.5 

0.0 

0.0 

0.0 

8.00 

5.63 

0.0 

0.2 

0.0 

0.0 

3.09 

2.43 

0.2 

0.2 

0.0 

0.0 

5.23 

7 

0.0 

0.4 

0.0 

0.0 

3.99 

3.10 

0.0 

0.5 

0.0 

0.0 

4.00 

3.22 

0.0 

0.0 

0.2 

0.0 

1.99 

1.96 

0.2 

0.0 

0.2 

0.0 

3.86 

7 

0.0 

0.0 

0.4 

0.0 

2.00 

2.00 

0.0 

0.0 

0.5 

0.0 

2.00 

2.00 

0.0 

0.0 

0.0 

0.2 

1.51 

1.32 

0.2 

0.0 

0.0 

0.2 

3.75 

7 

0.4 

0.0 

0.0 

0.2 

4.99 

7 

0.0 

0.2 

0.0 

0.2 

3.00 

7 

0.2 

0.2 

0.0 

0.2 

4.00 

7 

0.0 

0.0 

0.2 

0.2 

2.00 

7 

0.2 

0.0 

0.2 

0.2 

3.29 

7 

0.0 

0.2 

0.2 

0.2 

2.86 

7 

0.0 

0.0 

0.4 

0.2 

2.00 

7 

0.0 

0.0 

0.0 

0.4 

1.99 

1.90 

0.2 

0.0 

0.0 

0.4 

3.20 

7 

0.0 

0.2 

0.0 

0.4 

2.67 

7 

0.0 

0.0 

0.2 

0.4 

2.00 

7 

0.0 

0.0 

0.0 

0.6 

2.00 

2.00 

0.2 

0.0 

0.0 

0.6 

2.85 

7 

0.0 

0.2 

0.0 

0.6 

2.50 

7 

0.0 

0.0 

0.2 

0.6 

2.00 

7 

0.0 

0.0 

0.0 

1.0 

2.00 

2.00 

Table  3.3:  Results  from  number  of  segments  model  for  eight  slices 


An  examination  of  the  estimated  optimal  decision  in  each  state  of  the  number  of  segments 
model  revealed  that  the  number  of  slates  with  non-maximum  reward  decisions  increased  as  the 
request  probabilities  became  dominated  by  short  (i.c.  length  1)  and  long  (i.e.  length  3  and  4) 
requests.  Otherwise  the  number  of  suites  with  less  Ilian  die  maximum  reward  was  quite  small.  In 
fact,  as  long  as  p\  and  p 3  were  both  small,  the  estimated  optimal  decision  in  each  state  almost 
always  gave  the  maximum  reward.  The  optimal  throughput  of  the  number  of  segments  model  with 
a  maximum  reward  constraint  was  very  close  to  the  unconstrained  optimal  throughput  except 
when  there  were  mostly  short  and  long  requests.  Of  the  request  probabilities  listed  in  Table  3.3. 
the  degradation  caused  by  the  maximum  reward  constraint  was  greatest  (0.40)  for  p\-A.  /i;-  0, 
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pj~0,  and  p 4 -.2).  On  tlie  other  hand,  the  optimum  throughput  of  the  number  of  segments 
model  with  a  maximum  number  of  segments  constraint  was  indistinguishable  from  the  uncon¬ 
strained  optimal  throughput  inumk,’r  of  snm,-nts  a|j  rcqucst  probabilities  listed  in  Table  3.3 
except  for  p  1  =.4,  pi=0,  p j  -~0.  and  P4-.2.  litis  comes  as  no  surprise  since  the  estimated  optimal 
decision  in  each  state  in  the  unconstrained  ease  almost  always  utilized  the  maximum  number  of 
segments. 

Ilicsc  observations  suggest  that  the  trends  in  the  optimal  decisions  for  the  Ringbus  model 
for  S  ~b.  discussed  in  section  3.5,  continue  for  S  =8.  In  particular,  these  observations  suggest  that 
the  maximum  reward  constraint  has  even  a  greater  effect  on  the  optimum  throughput  of  the 
Ringbus  for  S  =8  than  for  S  =6,  reflecting  the  sharper  contrast  between  short  and  long  request 
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.1.7  Hie  Symmetric  Ringbus  With  More  lliaii  Kight  Slices 

Any  pursuit  of  the  optimum  throughput  and/or  optimum  policy  for  more  than  eight  slices 
and  general  request  probabilities  seems  hopeless.  As  the  number  of  slices  increases  much  past 
eight,  there  even  begin  to  be  too  many  suites  to  compute  the  optimum  throughput  on  the  faces 
and  along  the  axes  representing  requests  of  length  less  than  S/2.  (The  number  of  suites  along 
these  axes  is  3‘v  where  S  is  die  number  of  slices.  Only  2s  suites  arc  required  to  compute  the 
optimum  along  the  axis  representing  requests  of  length  .S'/  2.  litis  number  can  be  reduced  further 
as  we  discuss  in  section  3.7.1.)  Of  course,  the  general  characteristics  of  the  throughput  as  discussed 
in  section  3.6.1  remain  the  same  for  more  titan  eight  slices.  In  addition,  the  bounds  discussed  pre¬ 
viously.  particularly  the  number  of  segments  bound,  can  still  be  effectively  applied  (although  the 
number  of  suites  increases  rapidly  above  eight  slices  for  the  number  of  segments  bound). 

1.7.1  Throughput  as  a  Function  of  the  Number  of  Slices  for  Some  Special  Cases 

Two  special  eases  for  which  it  is  easy  to  determine  the  optimum  throughput  of  the  Ringbus 
for  a  large  number  of  slices  are 

(i)  at  an  extreme  point  of  die  feasible  probability  region  i.c.  at  a  point  where  5  and  pj~ 0 
for  j*i  for  some  0</'<.V/2,  and 

(ii)  along  die  axis  corresponding  to  requests  of  length  S/2  i.c.  />,  0  for  i -  2,  •  *  • .  S/2  -  l. 

Using  rotational  and  Hip  symmetry,  the  2s  states  in  ease  (i)  can  be  reduced  by  a  significant 
fraction. 

One  extreme  point  of  particular  interest  in  ease  (i)  is  />|=.5.  where  the  maximum  diroughput 
occurs.  We  can  easily  obtain  bounds  on  diis  maximum  throughput  for  a  large  number  of  slices  as 
follows. 

l  et  the  number  of  requests  in  a  round  in  the  clockwise  direction  be  denoted  by  ncw  and  the 
number  of  requests  in  a  round  in  the  counterclockwise  direction  be  denoted  by  nci.w 
("ov  f  'i,c\y  -S).  Since  all  die  requests  arc  nonnull  and  of  length  one.  we  can  grant  at  least 
requests  in  a  round.  Imagine  an  arbiter  which  operates  by  granting  exactly 
max  (nt„  ,na.w)  requests  in  every  round.  Since  an  optimal  arbiter  can  grant  at  least  this  number  of 
requests  in  every  round,  the  throughput  of  the  Ringbus  with  this  clockwise-counterclockwise 
arbiter  (which  we  term  the  cw-ccw  arbiter)  dius  gives  a  lower  bound  on  die  optimum  throughput 
of  die  Ringbus  for  p\~.5. 

An  obvious  state  description  of  die  Ringbus  with  the  cw-ccw  arbiter  is 

(ttftv.W  rov). 

However,  we  can  reduce  the  number  of  states  by  utilizing  the  symmetry  between  the  clockwise 


s.  VsW*  z./‘^.vvv  v. 


V  V  V  V  V  . 

“  *  V  %*  Vs'*'  V  *  *  v 


o  v’-.  ' 

vaC-vv 


V  -  .  «  .  •  .  . 
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and  counterclockwise  requests.  Thus  we  consider  instead  the  slate  description 


(nun) 

where  mi-- ■imx(nem.nKW)  (i.c.  m>S/l)  and  m  +  n--S.  It  is  convenient  to  number  the  suites 
with  n,  n  -  0, 1.  2.  •  •  • .  S/2.  I  he  reward  in  each  state  is  mi.  'Ihc  one  step  transition  probability 
from  state  (mi  ,h )  to  state  (mi', m')  is  given  by 


where 


2 


s 


-n 


S-n 

n'-n 


S/2 


n'<S/2 


a 

b 


- -  if  „  and  b  arc  integers  and  6  =  0,  1 . a 

b\(a  -6)1 

0.  otherwise 


'Ihis  expression  for  p„y  may  be  understood  as  follows.  'Ihc  reward  in  state  (mi.m)  is  .S’ -m;  hence 
the  next  suite  has  S  -n  new  requests.  'Ihcrc  arc  two  ways  for  this  next  state  to  be  (mi'.m')  if 
n'<S/  2: 

1)  n'  n  (where  m'>m)  of  the  S-n  new  requests  arc  in  the  same  direction  as  the  n  old 
requests  inherited  from  suite  (mi.m)  and  the  suite  is  not  "flipped”  (i.c.  S  -m'>/i'). 

2)  n'  of  the  S-n  new  requests  arc  in  the  opposite  direction  as  die  m  old  requests  inherited 
from  state  (mi.m)  and  the  state  is  "flipped"  (i.c.  n'<S  -n'). 

'Ihcrc  is  only  one  way  for  the  next  slate  to  be  m',n'  if  n' -S/2  since  the  state  is  never  "flipped" 
in  (his  ease. 


Ihc  throughput  of  die  Kingbus  with  the  cw-ccw  arbiter  is  given  by 

lcwirw^wn(S-it) 
n  =0 


where  it„  is  the  steady  suite  probability  of  being  in  state  n.  Ihc  w„ 


n  =  0,  1.  2,...,  S/  2,  and  l- 

/r'-0 


satisfy  ir„ 

n'=0 


We  computed  lcw  “*  for  various  values  of  S  :  die  results  arc  listed  in  Table  3.4  along  with 
the  optimum  throughput  of  die  Kingbus  for  4,  6.  and  8  slices.  Note  that  die  lower  bound  given  by 
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is  equal  to  the  optimum  throughput  for  4  slices.  'ITic  lower  bound  is  progressively  less 


tight  for  6  and  8  slices.  We  expect  that  this  trend  continues  as  the  number  of  slices  increases 

,CW  - CCW  1 


further.  As  .V-*oo,  an  average  of  2/3  of  the  requests  are  granted’,  hence  — — - *  —  as  the 

figures  indicate  in  Table  3.4. 


fCW  -CCW 

topt 

.  2.833 

2.833 

4.154 

4.23 

5.473 

5.63 

6.792 

7 

8.112 

7 

9.433 

7 

10.755 

7 

12.078 

7 

13.403 

7 

14.729 

7 

16.055 

7 

17.382 

7 

18.709 

7 

20.038 

7 

21.367 

7 

l  ablc  3.4:  i‘ 


for  various  values  of  S 


We  can  obtain  an  upper  bound  on  the  optimum  throughput  of  die  Ringbus  for  /?  j  —.5  and 
S  even  by  considering  only  destination  conflicts.  Number  the  slices  from  1  to  S  in  die  clockwise 
direction  around  the  Ringbus.  Odd  numbered  slices  only  request  even  numbered  slices  and  even 
numbered  slices  only  request  odd  numbered  slices.  Ilnis,  ignoring  the  segment  conflicts,  the 
Ringbus  is  equivalent  for  P|=.5  to  two  S/2XS/2  crossbars  -  one  connecting  odd  sources  to 
even  destinations  and  die  other  connecting  even  sources  to  odd  destinations,  f-acli  of  these 
crossbars  consist  of  S/A  cells  as  depicted  in  figure  3.25. 


t  lor  In r tic  .S’,  if  in  requests  arc  granted  in  the  current  round,  then  the  average  number  of  requests  granted  in 
the  next  round  ill  ,  is  the  number  of  leftovers  from  the  current  round  plus  half  of  the  new  requests  that  arrive 


•  ...  .  m  ,  2  „ 

tn  the  next  round  i.c.  ill  —  (>»  —  ill)  +  In  steady  state  ill  —  III.  hence  III  --  —  .S  . 

2  3 
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Figure  3.25:  A  crossbar  cell 


Suppose  we  ignore  the  interactions  between  cells.  (The  cells  interact  via  conflicts  at  the  destina¬ 
tions  in  common  with  adjacent  cells.)  I  inch  cell  is  thus  independent  of  all  llic  others.  Under  this 

7 

condition,  it  is  easy  to  establish  that  the  throughput  of  a  cell  is  (cen  =  — .  Ihc  total  number  of  cells 

4 

is  S/  2,  hence 


Considering  that  this  is  also  a  bound  on  the  throughput  of  a  crossbar  with  /.'|  -.5  and  S 

,opl 

slices,  tliis  is  a  poor  upper  bound  for  die  Ringbus.  Examining  Table  3.4,  it  appears  that  — — 
decreases  monotonically  with  S.  This  leads  us  to  make  the  following  conjecture. 


Conjecture 

Let  denote  die  optimum  throughput  of  a  S  slice  Ringbus  with  request  probabilities  />/*, 
i  =  -(.V/2-1),  •  •  •  .  S/2  if  S  is  even,  and  i~~(S  -l)/2,  •  •  •  ,  (.V  -l)/2  if  S  is  odd.  l  et 
iff'+ 1  denote  the  optimum  diroughput  of  a  .V  +  I  slice  Ringbus  with  request  probabilities 


Pi 


.v  + 1 


p/\  M--0, 1,2,  • 
PS/2 .  i  ~±S/2 
0,  i  —  S/2  +  1 


S/2  -  1  if  .V  is  even 


and 


Pi\  |  rf  =  0,  1 .  2,  •  •  • .  (.V  - 1 V  2,  if  S  is  odd 
0,  otherwise. 


e', ,  t.r 

I  hen,  assuming  grant  durations  of  a  single  round,  -  < - .  Note  that  this  general i/.cs  to 

S  +  1  S 
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Unfoitunutcly.  tiiis  conjecture  seems  difficult  to  prove.  If  it  is  tine,  then  a  much  better  bound  on 
the  optimum  throughput  of  the  Ringbus  for  pi=.5  is 

if„  <  -  1 1  ^  ^ ] 2.833 


for  m  —  I, 


Cor  large  S  the  conjecture  leads  to 


w  ,r  w  w 

—  <  — —  =.704  <  — —  -  .705  <  -  —  <708. 

•V  -  8  “  6  ~  A  ~ 

It  is  not  easy  to  obtain  good  lower  bounds  oi’~  lie  other  extreme  points  pt-~. 5  ( i*S/2 ) 
since  the  possibility  of  requests  in  tlic  same  direction  conflicting  introduces  additional  complexity. 
Otic  way  to  obtain  a  lower  bound  on  the  optimum  throughput  for  Pj-.S.  0</<V/2,  is  to  con¬ 
struct  a  Ringbus  in  which  all  requests  arc  of  length  1  by  deleting  the  i  -  1  slices  between  every  i'h 

slice.  Of  course,  this  is  only  successful  (although  it  can  be  modified)  if  —  is  an  integer.  As  an 

i 

example,  consider  S  -8  and  pi  -.5.  After  deleting  • — V  second  slice,  we  obtain  a  4  slice  Ringbus 
with  all  requests  of  length  1.  The  throughput  for  such  a  Ringbus  is  2.833,  hence  a  lower  bound  on 
the  throughput  of  the  eight  slice  Ringbus  with  pi=.5  is  2.833.  The  exact  throughput  in  this  ease  is 
3.22  (see  figure  3.24). 

for  some  extreme  points  ropl  -2.  Ihis  is  obviously  true,  for  example,  for  pS/  i-l.  It  is  also 

true  for  pi  -.5  when  ---  -2  (.V  even). 
i 

Ihe  optimum  throughput  along  the  S/  2  axis  is  easy  to  calculate  for  large  S  since  the 
number  of  slates  can  be  greatly  reduced  from  the  2  mentioned  earlier.  If  the  only  nonnull 
requests  arc  of  length  S/2,  it  suffices  for  a  state  description  to  merely  describe  the  number  of 
pairs  of  slices  with  zero,  one,  and  two  nonnull  requests,  where  two  slices  180' apart  on  the  Ringbus 
comprise  a  pair.  Ihus  the  state  description  is 

(«(V«  l.«2> 

where  /=  0,  1,  or  2  is  the  number  of  pairs  with  i  nonnull  requests  and  2  (S  even). 

i=0 

.  ,  ftl  •  I.Y/2  +  21  {S/2  +  1KS/2+1) 

Ihe  total  number  of  stiles  is  |  |  - - - - . 

A  lower  bound  on  the  optimum  throughput  along  the  .5/  2  axis  can  be  obtained  easily  by 
ignoring  leftover  requests.  Certainly, 


Kingbus  Model 


207 


t0p,>2-Prob(ai  least  1  pair  has  2  requests)  -f 

VProbi at  least  1  pair  has  1  request  and  no  pair  has  2  requests) 
iopl  >2- Prob(iH  least  1  pair  has  2  requests)  + 

1  -  Prvb (at  least  1  pair  has  2  requests)  -Prvb  (no  pair  has  any  requests) 

Now  P rob  (at  least  1  pair  has  2  requests)  -  1  -  Prob  (no  pair  has  2  requests)  - l  -  (1  -ps/il)  and 

y  S/2 

Prob( no  pair  has  any  requests)  =((1  -pS/i)  )  ,  thus 

top,>2  -(1  -ps,!1)*'1- (1  -ps,i)S. 

Ihc  throughput  as  a  function  of  S  gives  some  idea  of  the  scalability  of  the  Kingbus.  Ihc 
throughput  varies  with  the  traffic,  as  reflected  by  the  values  of  the  />,.  Ihis  latter  factor  affects  the 

2 

throughput  the  most:  with  po~0,  the  throughput  can  vary  from  2  to  somewhere  between  -  .V  and 
7 

—S.  'Ihc  sensitivity  of  the  throughput  to  the  distribution  of  request  lengths  is  perhaps  best  illus- 

O 

s 

trated  by  the  bound  /<--  from  section  3.5.1. 1. 
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3.8  The  Performance  of  the  Concert  Ringbus 

In  this  section  wc  investigate  the  performance  of  the  Concert  Ringbus  and  compare  its  per¬ 
formance  with  that  of  the  Symmetric  Ringbus.  (We.  of  course,  have  to  specify  some  arbitration 
scheme  for  die  Symmetric  Ringbns.  Wc  do  this  shortly.)  The  Concert  Ringbns  has  asymmetrical 
ttcccss  paths,  and  a  rotating  priority  arbitration  algorithm,  as  discussed  in  section  3.1.  The  investi¬ 
gation  and  comparison  consist  of  three  parts: 

1)  Wc  determine  the  effect  of  the  asymmetrical  access  paths  by  comparing  the  optimum 
throughput  with  asymmetrical  access  paths  (i.e.  the  Asymmetrical  Ringbus)  to  the  optimum 
throughput  with  symmetrical  access  paths  (i.e.  the  Symmetrical  Ringbus). 


2)  We  determine  the  effect  of  the  rotating  priority  arbitration  algorithm  by  comparing  the 
throughput  with  this  algorithm  for  the  Symmetric  Ringbus  with  the  optimum  throughput  for 
the  Symmetrical  Ringbus. 

3)  We  determine  the  effect  of  both  the  asymmetrical  access  paths  and  die  rotating  priority  arbi¬ 
tration  algoridim  (i.e.  die  Concert  Ringbus)  by  comparing  the  throughput  with  these  to  the 
optimum  throughput  with  symmetrical  access  paths. 

Wc  consider  only  four  slice  Ringbuses.  There  arc.  unfortunately,  too  many  suites  to  consider 
Markov  chain  models  for  six  or  more  slices.  The  state  description  with  the  asymmetrical  access 
paths  in  part  1  remains 

(ri.O.  •  •  •  Tv) 

where  r ,  (S'/ 2-  1),  •  •  ■  ,  - 1,  0.  I.  •  •  •  ,  S/2.  but  flip  symmetry  can  no  longer  be  utilized  to 

reduce  the  number  of  suites  because  a  request  in  die  counterclockwise  direction  requires  more 
segments  than  a  request  of  a  similar  number  of  hops  in  the  clockwise  direction.  Thus,  for  .S’  -  4  die 
number  of  suites  is  70,  an  increase  of  about  86%  above  the  43  suites  for  the  Symmetric  Ringbus.  A 
similar  increase  for  S  -  6  would  pul  the  number  of  suites  at  about  7400.  This  number  may  not 
seem  all  that  unreasonable.  However,  we  fell  it  was  not  worth  pursuing  part  I  for  .V  6  if  wc 
could  not  also  pursue  parts  2  and  3  for  S  6.  The  suite  description  with  rotating  priority  is 

(r\.ri.  ■  -  •  .rv./») 

where  p,  (p  i  k)  mod  S  is  the  priority  of  the  request  at  slice  i  and  r,  is  the  same  as  before,  l  or 
.V  4,  the  number  of  states  for  symmetrical  access  paths  is  127  and  the  number  of  suites  for  asym¬ 
metrical  access  paths  is  214.  Similar  increases  for  .S'  6  would  put  the  number  of  suites  above 

10,000.  which  we  consider  to  be  too  many  states. 

We  could  have  pursued  parts  2  and  3  for  larger  values  of  .S'  via  simulation.  In  fact,  we  did 
do  tins  for  V  8:  the  results  .ire  reported  u  Chapter  4.  However,  the  simulations  reported  in 


'o’.AVT -z  v  v  v.v.v  •.*.  z, 
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Chapter  4  arc  for  die  entire  Concert  model,  not  just  die  Ringbus  as  is  our  focus  here.  For  exam¬ 
ple.  die  simulations  reported  in  Chapter  4  assume  grant  durations  of  nine  rounds  and  arbitration 
times  of  two  rounds:  wo  assume  single  round  grant  durations  and  instantaneous  arbitration  here. 
Pait  I  cannot  be  carried  out  via  simulation  since  optimization  is  impmetical  via  simulation. 

3.8.1  Hie  KfTcct  of  Asymmetrical  Access  Paths 

One  fiictor  complicating  the  comparison  of  the  optimum  throughput  with  asymmetrical 
access  paths  with  the  optimum  throughput  with  symmetrical  access  paths  is  that  users  may  adapt 
their  programs  to  suit  die  topology.  As  a  result,  requests  may  be  biased  in  favour  of  the  clockwise 
direction  in  the  former  ease  and  unbiased  in  direction  in  the  latter  case.  To  avoid  biasing  die  com¬ 
parison.  we  present  die  results  for  various  asymmetrically  weighted  and  symmetrically  weighted 
request  probabilities  for  both  asymmetrical  and  symmetrical  access  paths. 

figures  3.26,  3.27,  and  3.28  show  the  optimum  throughput  with  asymmetrical  and  symmetri¬ 
cal  access  paths  for  p  _  |  --p  |,  p  _  |  --  .5 p  (,  and  p  j  -  0  respectively.  (/>  |  is  die  probability  of  a 
request  of  one  hop  in  die  counterclockwise  direction.)  Note  that  the  optimum  diroughput  with 
asymmetrical  and  symmetrical  access  padis  is  identical  for  p  .  j  -0:  hence  only  one  set  of  points  is 
shown  in  Figure  3.28.  As  expected,  the  difference  in  die  optimum  throughputs  for  asymmetrical 
ar.d  symmetrical  access  paths  increases  as  p  _  i  decreases. 

3.8.2  'the  KfTcct  or  the  Rotating  Priority  Arbitration  Algorithm 

Figure  3.29  shows  die  optimum  throughput  of  the  Symmetric  Ringbus  and  the  throughput  of 
die  Symmetric  Ringbus  with  the  rotating  priority  arbitration  algoridim  used  in  the  Concert 
Ringbus.  For  very  light  traffic,  the  throughput  with  rotating  priority  is  close  to  die  optimum.  For 
all  other  traffic,  the  throughput  with  rotating  priority  quickly  deteriorates  with  respect  to  die 
optimum.  The  maximum  diroughput,  at  /i|-.5,  with  rotating  priority  is  .42  less  than  the  optimum 
diroughput.  For  pi~\  the  deterioration  is  especially  severe.  Hven  though  two  requests  can  be 
granted  without  conflicting,  die  rotating  priority  algoridim  only  grants  one  request.  The  reason  for 
diis  stupidity  is  that  slices  arc  assigned  consecutively  decreasing  priorities  in  the  clockwise  direction 
from  the  highest  priority  slice.  Since  no  request  can  be  granted  which  may  conflict  with  one  at  a 
higher  priority,  a  long  request  currently  blocked  by  a  request  granted  by  a  higher  priority  slice  can 
nevertheless  prevent  an  otherwise  grantablc  request  from  being  granted  due  to  a  conflict  with  die 
higher  priority  long  request.  An  example  of  such  a  situation  is  shown  in  Figure  3.30. 


Figure  3.26:  A  comparison  of  optimum  throughputs  for 
four  slices  and  one  round  grant  durations  with  p -\=p\ 
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Figure  3.27:  A  comparison  of  optimum  throughputs  for 
four  slices  and  one  round  grant  durations  with  p  _  i  =.5p  i 
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P.  is  the  priority 
of  the  request  at  slice  i 


Figure  3.30:  Fxamplc  of  a  disadvantage  of  die  rotating  priority  algorithm 

Slice  l’s  request,  which  may  be  granted  because  it  has  the  highest  priority,  conflicts  with  the  lower 
priority  slice  2’s  request,  hence  slice  2's  request  cannot  be  granted.  I  lowcvcr.  slice  2's  request  con¬ 
flicts  with  the  lower  priority  slice  3  s  request  and  thus  slice  3’s  request  cannot  be  granted  cither, 
even  though  it  is  otherwise  granlablc.  An  obvious,  fix  to  the  problem  is  to  stagger  the  slice  priori¬ 
ties  as  shown  in  Figure  3.31. 


1*3  -  ? 


Figure  3.31:  Staggered  request  probabilities 

Ibc  consecutive  assignment  of  slice  priorities  around  the  Ringbus  will  also  obviously  lead  to 
throughput  degradation  for  a  larger  number  of  slices,  such  as  S  8.  and  for  cases  in  which  clock¬ 
wise  requests  of  greater  than  one  hop  predominate.  Hie  priorities  can  be  staggcied  in  a  manner 
similar  to  that  in  Figure  3.31  to  reduce  this  degradation.  Interestingly,  it  is  easy  to  modify  the  Con¬ 
cert  Ringbus  to  effect  such  a  change  to  the  assignment  of  the  slice  priorities.  A  new  arbiter  priority 
ROM  (a  2 K  X8  ROM)  is  all  that  is  required. 

A  different,  but  still  simple,  improvement  to  the  throughput  of  the  Symmetric  Ringbus  with 
the  rotating  priority  algorithm  is  to  change  the  direction  of  the  rotation.  When  the  priorities  are 
updated  in  the  Concert  Ringbus  arbiter,  the  highest  priority  is  assigned  to  die  next  slice  with  a 
pending  request  in  the  counterclockwise  direction  from  the  current  highest  priority  slice.  Clockwise 
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rotation  of  the  priority  yields  better  throughput  (assuming  the  slices  are  assigned  consecutively 
decreasing  priorities  in  the  clockwise  direction  from  the  highest  priority  slice).  The  maximum 
improvement  in  throughput  by  reversing  the  priority  rotation  from  counterclockwise  to  clockwise 
is  .10.  which  is  attained  at  p (-.5.  As  before,  a  new  arbiter  priority  ROM  is  all  that  is  required  to 
implement  clockwise  priority  rotation. 

3.8.3  The  KfTcet  of  Asymmetrical  Access  Paths  and  the  Rotating  Priority  Arbitration  Algorithm 

Figures  3.32.  3.33.  and  3.34  show  the  throughput  with  asymmetrical  access  paths  and  die 
rotating  priority  algorithm,  the  optimum  throughput  with  asymmetrical  access  paths,  and  the 
optimum  throughput  with  symmetrical  access  paths  for  p  _|  -p\.  p .  |-.5p|.  and  p  _  |  -  0  respec¬ 
tively.  As  in  Figure  3.29,  the  rotating  priority  algorithm  imposes  a  degradation  in  throughput  (as 
compared  with  the  optimum  throughput  with  asymmetrical  access  paths)  that  increases  as  p\  or 
Pi  or  both  increase.  For  p\  p  _j -  .5,  the  degradation  is  .30  or  16%.  Again,  the  degradation  is 
especially  severe  for  pi-  1.0. 

The  throughput  degradation  is  mostly  attributable  to  die  rotating  priority  algorithm  if  pi  is 
large  and  is  mostly  attributable  to  the  asymmetrical  access  paths  if  p  \  and  p  -\  arc  both  large  and 
if  the  request  probabilities  are  die  same  witli  symmetrical  and  asymmetrical  access  paths.  (  This 
comparison  can  be  misleading  since  die  request  probabilities  would  probably  have  a  strong  clock¬ 
wise  bias  in  direction  in  any  Ringbas  with  asymmetrical  access  paths  and  would  piobably  be  rela¬ 
tively  unbiased  in  any  Ringbus  with  symmetrical  access  paths.  Sec  the  paragraph  at  the  beginning 
of  section  3.8.1.)  The  throughput  degradation  attributable  to  die  asymmetrical  access  paths  dimin¬ 
ishes  as  p  _  |  — *0  if  die  request  probabilities  arc  the  same  with  symmetrical  and  asymmetrical  access 
paths.  (  The  same  parenthetical  note  applies  to  this  statement  too.) 

Wc  expect  all  the  trends  observable  in  Figures  3.32,  3.33.  and  3.34  to  be  accentuated  with 
larger  values  of  S. 
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Figure  3.32:  A  comparison  of  throughputs  for  four  slices  and 
one  round  grant  duration  with  p  -\=p\ 
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Figure  3.33:  A  comparison  of  throughputs  for  four  slices  and 
one  round  grant  duration  with  p-i=.Sp\ 
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3.9  'Hu*  Kingbus  in  (he  Concert  (environment 

So  far  in  this  chapter  we  have  considered  the  Kingbus  model  in  isolation.  Now  we  consider 
some  of  the  differences  between  this  artificial  environment  and  the  Concert  environment.  We  dis¬ 
cuss  the  effects  that  these  differences  have  on  the  operation  and  performance  of  lltc  Kingbus.  In 
section  3.9.1  we  discuss  die  details  of  the  Muitibtis-Kingbus  connection  and  develop  die  hooks  for 
die  integration  of  die  Kingbus  model  with  die  Multibus  models  in  Chapter  4. 

ITic  major  differences  between  the  artificial  environment  of  die  isolated  Kingbus  and  the 
Concert  environment  arc: 

1)  the  duration  of  die  grants, 

2)  the  arbitration  time, 

3)  die  detui  time  between  successive  Kingbus  requests,  and 

4)  global  register  accesses. 

The  duration  of  a  grant  is  die  total  duration  for  which  Kingbus  segments  arc  allocated  to  a 
request.  As  reported  in  section  3.3.2  of  Appendix  A,  diis  duration  is  9  or  10  arbiter  clock  cycles  - 
i.e.  9  or  10  rounds  -  for  reads  and  write  accesses  when  die  arbiter  clock  period  is  200nscc.  Other 
than  for  a  geometrically  distributed  grant  duration  with  a  mean  of  10  rounds,  we  did  not  investi¬ 
gate  the  isolated  Kingbus  model  for  such  long  grant  durations.  Furthermore,  tin's  ease  with  a 
mean  grant  duration  of  10  rounds  applied  for  S  -4  and  symmetric  access  paths.  Thus  grant  dura¬ 
tions  in  the  Concert  environment  arc  much  longer  than  we  considered  for  the  isolated  Kingbus 
model  except  in  one  special  case. 

As  discussed  in  section  3.4,  we  expect  that  the  effect  of  the  long  grant  durations  on  the 
optimum  performance  of  die  Kingbus  can  be  estimated  fairly  well  from  the  optimum  diroughput 
widi  a  deterministic  grant  duration  of  one  round  and  equation  3.18.  It  should  be  possible  to  esti¬ 
mate  (he  cflcct  of  long  grant  durations  on  the  throughput  for  arbitration  algorithms  other  than  die 
optimum,  by  similar  means.  We  expect  then  that  the  performance  of  the  Kingbus  is  initially  quite 
sensitive  to  die  duration  of  grants  and  decreases  rapidly  as  the  duration  of  grants  increases. 

The  arbitration  time  (or  more  precisely,  the  arbitration  delay)  can  be  divided  into  two  com¬ 
ponents.  At  some  point  during  the  arbitration  time  the  arbiter  decides  (or  can  be  regarded  as 
deciding)  whether  or  not  to  grant  a  request.  The  rest  of  die  time  is  a  delay  gathering  request 
information  before  the  decision  and  a  delay  communicating  and  implementing  the  decision.  Thus 
the  arbitration  time  may  be  tie., ted  by  assuming  instantaneous  arbitration  time  and  adding  the 
appropriate  grant  implementation  delay  to  the  request  intcrarriv.il  time  and  die  appropriate  grant 
implementation  delay  to  the  gram  duration.  Increasing  the  request  intcrariival  lime  (i.e.  increasing 
Pq)  and  increasing  the  pi.au  duration  decreases  the  diroughput.  I  he  exact  dice l  of  adding  these 
delays  depends  on  the  magnitudes  of  die  delays  and  die  parameter  \  allies  for  die  requests  and 
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grant  durations.  Ilic  arbitration  delay  in  Concert  is  two  rounds  -  one  round  of  request  gathering 
delay  and  one  round  of  grant  implementation  delay.  Idr  light  to  medium  loading  the  resultant 
additional  clock  cycle  of  request  interarrival  time  will  cause  little  change  in  />o  and  hence  will  have 
little  ellect  on  performance.  Likewise,  the  elfcct  of  the  additional  clock  cycle  of  grant  duration  will 
be  small  since  grant  durations  are  already  quite  long  in  Concert 

'Ihc  dead  time  between  successive  Kinghus  requests  is  the  minimum  time  between  the  end 
of  a  Ringbus  grant  and  the  next  nonnull  request  generated  from  the  same  slice.  In  our  isolated 
Kingbus  model  we  assumed  a  dead  time  of  zero.  I  lowevcr,  in  Concert  there  is  a  dead  time  of  2  or 
3  lounds.  (  The  dead  time  corresponds  to  the  minimum  value  of  ipH,t,v  which  is  reported  in  sec¬ 
tion  3.3.2  of  Appendix  A.  We  define  tpHtqy  and  discuss  the  details  of  the  Multibus-Ringbus 
interaction  in  section  3.9.1.)  Since  the  dead  tiinc  is  relatively  small  compared  to  the  total  duration 
of  a  grant,  we  do  not  expect  the  dead  time  to  have  a  large  direct  effect  on  the  performance  of  the 
Kingbus  as  compared  to  that  predicted  by  our  isolated  Kingbus  models.  Of  course,  there  will  be 
an  indirect  effect  since  the  dead  lime  portion  of  the  processing  time  is  not  well  approximated  by 
the  geometric  distribution  which  we  assume  for  die  processing  time  in  our  isolated  Ringbus 
models.  The  consequence  of  the  dead  lime  is  lliat  the  mean  processing  time  must  be  at  least  2  or 
2 

3  rounds,  and  dins  p o>  - .  This  corresponds  to  light  traffic  in  our  isolated  Kingbus  models. 

We  have  already  discussed  global  register  accesses.  Accesses  to  global  registers  on  a  slice  dif¬ 
ferent  from  the  slice  originating  the  access  can  be  treated  as  special  global  memory  requests. 
Accesses  to  global  registers  on  die  same  slice  originating  the  access  cannot  be  treated  in  diis 
manner,  instead,  we  simply  ignore  such  accesses.  We  expect  global  register  accesses  to  be  infre¬ 
quent  in  normal  operation,  so  the  effect  of  ignoring  such  accesses  in  our  isolated  Kingbus  models 
to  be  minimal  in  most  cases. 

Note  that  there  is  additional  information  available  in  the  Concert  environment  which  could 
conceivably  allow  die  Kingbus  arbiter  to  achieve  better  performance.  In  Concert,  die  only  infor¬ 
mation  available  to  the  Kingbus  arbiter  is  die  type  of  request  or  grant  present  at  each  slice.  Ilic 
arbiter  is  able  to  infer  from  this  information  die  duration  dial  the  request  has  been  pending  or 
dial  die  grant  has  been  in  progress  at  each  slice.  Other  information  available  in  the  Conceit 
environment,  but  not  available  to  the  arbiter,  is  the  number  and  type  (i.c.  Multibus  or  Kingbus)  of 
requests  in  each  Multibus  queue  and  the  waiting  time  so  far  of  each  request. 

Since  all  other  Multibus  activity  is  blocked  during  the  entire  duration  of  a  Kingbus  access  - 
even  during  die  period  which  the  access  waits  for  use  of  the  Kingbus  -  the  arbiter  could  conceiv¬ 
ably  give  priority  to  Kingbus  accesses  blocking  a  large  number  of  Multibus  accesses  and  thereby 
improve  the  overall  throughput  of  Concert. 


Finally,  note  (hat  although  the  arbiter  clock  period  does  affect  the  performance  of  the 
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Ringbus,  the  effect  is  not  as  large  as  one  may  expect.  ITic  reason  is  that  a  considerable  fraction  of 
the  duration  of  a  Ringbus  grant  is  approximately  constant  independent  of  the  arbiter  clock  period. 

3.9.1  The  fqui  valent  Model  of  the  Ringbus 

As  discussed  in  section  1.3.5.  the  Ringbus  can  be  replaced  by  an  equivalent  model  for  each 
slice-Ringbus  connection.  The  equivalent  model  for  each  slicc-Ringbus  connection  is  the  Ringbus 
access  time  distribution  as  seen  by  dial  slice.  In  determining  these  equivalent  models  of  die 
Ringbus.  we  assume  diat  each  slice  has  been  replaced  by  its  single  processor  equivalent  with  some 
processing  lime  distribution,  with  mean  and  some  Ringbus  destination  probabilities 

pMli(<i\  ■  _  i)_  •  •  • ,  -1,  1,  2.  •  •  • .  or  S/2.  (.V  is  the  number  of  slices.)  We  assume 

dial  all  of  the  single  processor  equivalent  models  arc  identical  and  diat  die  Ringbus  is  symmetric 
with  respect  to  each  slice.  Under  diese  latter  two  assumptions,  die  equivalent  models  of  the 
Ringbus  arc  identical  for  each  slicc-Ringbus  connection  and  dius  the  Ringbus  is  completely 
characterized  by  one  equivalent  model.  As  noted  in  section  1.3.5,  diis  means  dial  only  one 
Multibus* Ringbus  connection  need  be  considered  during  integration. 

The  single  processor  equivalent  of  the  Multibus  and  die  Ringbus  each  perceive  a  Ringbus 
access  cycle  in  a  different  way.  from  die  point  ot  view  of  die  single  processor  equivalent,  a 
Ringbus  access  cycle  consists  of  a  processing  time,  denoted  by  ip,Hcqv.  and  an  access  time,  denoted 
by  i„rh.  tt,nn  includes  the  wailing  time,  if  any,  of  die  Ringbus  request  generated  by  the  access. 
The  probability  distribution  of  incorporates  the  Multibus  waiting  time,  f  igure  3.35  depicts 
die  point  of  view  of  the  single  processor  equivalent. 

Ringbiis  _  _ 

access  1 _ |  _ 

signal  |* -  {  - 4* - 4- 

(active  low)  dRH  p 


figure  3.35:  Point  of  view  of  single  processor  equivalent 

from  the  point  of  view  of  the  Ringbus.  a  Ringbus  access  cycle  consists  of  a  processing  time, 
a  waiting  time,  and  a  grant  duration,  all  defined  relative  to  the  arbiter  clock.  We  define  die  grant 
duration  as  the  total  duration  for  which  Ringbus  segments  arc  allocated  to  the  Ringbus  request 
generated  by  the  access;  we  denote  the  grant  duration  by  d.  We  define  the  processing  time  as  the 
interval  between  the  termination  of  a  grant  and  the  commencement  of  die  following  grant  in  the 
absence  of  contention  in  the  Ringbus.  We  denote  this  interval  by  to  indicate  the  processing 

time  as  seen  by  the  Ringbus.  f  inally,  we  define  the  waiting  time  to  be  the  duration  that  a  grant  is 
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delayed  due  to  Ringbus  contention;  we  denote  it  by  tv##.  We  measure  wnH.  and  J  syn¬ 

chronous  to  the  rising  edges  of  the  arbiter  clock.  Figure  3.36  depicts  the  point  of  view  of  the 
Ringbus. 
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Figure  3.36:  Point  of  view  of  Ringbus 

We  now  combine  the  points  of  view  of  the  single  processor  equivalent  of  the  Multibus  and 
the  point  of  view  of  the  Ringbus.  Central  to  this  combination  arc  the  facts  that  1)  the  Multibus 
operates  asynchronously  with  respect  to  the  Ringbus  arbiter  and  2)  the  arbitration  for  the  Ringbus 
takes  some  non/cro  time.  We  define  //„,<■/,.  as  the  time  required  to  synchronize  a  Multibus  request 
for  a  Ringbus  access  with  the  arbiter  clock.  More  precisely.  is  the  interval  between  the  arrival 

of  a  request  at  the  Ringbus  ai biter  and  the  next  rising  edge  of  the  ai biter  clock. *  We  define  !lirb 
as  the  arbitration  delay  of  llic  Ringbus  arbiter.  (ia,b  's  somc  integral  multiple  of  the  arbiter  clock 
period.)  In  addition,  we  define  iSM„  as  the  interval  between  the  initiation  of  a  Ringbus  access  on 
the  Multibus  and  the  arrival  of  the  corresponding  Ringbus  request  at  llic  Ringbus  arbiter.  (s„irl 
reflects  die  time  that  a  processor  takes  to  put  valid  signals  on  the  Multibus  once  it  has  seized  con¬ 
trol  of  die  Multibus  and  the  time  that  the  Rill  takes  to  decode  these  signals.  (We  consider  an 
access  on  the  Multibus  to  initiate  when  a  processor  seizes  control  of  the  Multibus  and  to  terminate 
when  die  processor  releases  control  of  the  Multibus.  See  section  2  of  Appendix  A  for  details.) 
Through  various  quirks  in  the  timing  of  Multibus  and  Ringbus  signals,  the  termination  of  an  access 
and  the  disassertion  of  the  Ringbus  request  at  the  Ringbus  arbiter  occur  at  approximately  the 
same  time.  (See  section  3.3.2  of  Appendix  A.)  We  assume  this  to  be  the  ease  here  and  dues  we  do 
not  introduce  a  corresponding 

The  combined  points  of  view  of  the  single  processor  equivalent  and  the  Ringbus  arc  pictured 
in  Figure  3.37  along  with  the  quantities  just  defined. 

t  Itiesc  signals  arc  drawn  as  active  low  10  p.n.illcl  the  signals  m  ihe  aclual  Concert  svsicm 
t  In  tins  seel  mu  we  ignore  delavs  ihal  would  norm  ill',  be  imtndivecl  to  mu  ip.ite  mclaslahditv  problems  and  as¬ 
sume  llial  die  ni burr  inpuls  are  sampled  on  eveiv  rism^  edre  of  die  , 'tinier  clock 
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Kigurc  3.37:  Combined  points  of  view  of  single  processor  equivalent  and  Ringbus 


Note  dial  die  access  time  of  (he  single  processor  equivalent  and  the  duration  for  which  seg¬ 
ments  are  allocated  in  die  Ringbus  -  i.c.  die  grant  duration  -  arc  out  of  phase.  Of  course,  die 
actual  period  for  which  die  data  transfer  occurs  is  the  same  for  the  single  processor  equivalent  and 
die  Ringbus.  We  denote  this  lime  by  l,rons .  The  arbitration  delay  skews  the  total  time  allocated  to 
the  access  in  die  respective  worlds  of  the  single  processor  equivalent  and  die  Ringbus. 


Taking 


means. 


we 


have 


:M  flfyv 


+  ftiKH  ~  'r  ll"’V  f  wKli  +  M'd 


thus 


itRH  where  4Tr>  ^  the  mean  Rnigbus  access 

time  when  there  is  no  contention  on  the  Ringbus  i.c.  when  w^n  -  0.  Note  that 


*uR  H  —  *  start  +  I  knell  p  *arb  *  brans- 


I  he  inputs  to  the  equivalent  model  of  the  Ringbus  arc  the  probability  distribution  of 
and  die  Ringbus  destination  probabilities  p[i,,‘r,,v.  The  output  is  the  probability  distribution  of 
'aRtt-  Ihe  inputs  to  the  actual  Ringbus  model  arc  the  probability  distribution  of  lplc,,v  and  the 
Ringbus  destination  probabilities.  pffHeqv.  Ihe  output  of  die  actual  Ringbus  model  is  the 


t  More  precisely,  we  have  lp^",v= 


I'jpr > - dprcv  i f  i*ii,r,<v  /  > dpnv 

•orb  K  i f  lp"lr,,r  t  t„„r,  <<tprn'  ifrans 


rev 
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where  C  is  ihe  arbiter  clock  period  and  (he  superscript  prex  denotes  the  quantities  from  the  previous  Ringbus 
access.  If  +  I  Muff  K(Jp,cv  —  ifriios-  '  c  1 1  a  new  Ringbus  request  arnves  al  the  Ringbus  arbiter  before 

the  grant  of  the  previous  request  has  been  disasserted  Ihis  previous  grant  must  be  disassciied  before  the  new 


request  can  Ire  accepted  by  the  arbiter,  hence  the  new  grant  follows  the  old  by  the  arbitiation  time  and  one  en¬ 


tire  arbiter  clock  cycle  for  latching  lllrf,  /  C  is  the  dead  time  mentioned  earlier 
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throughput,  or  alternatively,  The  Iwo  arc  related  by 

_ _ V _ _  g 

-i  J~  a 

where  g  is  the  average  number  of  new  or  continuing  grants  per  clock  period  (i.c.  round).  Now  to 
say  anything  more  about  the  relation  between  tpHiqv,  p,M,le,lv,  and  wpp  we  need  to  consider  a 
specific  Ringbus  model.  We,  of  course,  assume  the  Ringbus  model  discussed  in  this  chapter. 
Specifically,  we  do  the  following: 

1)  We  approximate  die  probability  distribution  of  lpttcqv  by  a  discrete  geometric  distribution 
with  the  same  mean.  I  hus  po  in  our  Ringbus  model  can  be  computed  from  the  relation 

PO  iKHrqv 

- - - where  c  is  the  arbiter  clock  period. 

I  ~  Po  c 

2)  We  set  Pi  -  p*,litvv.  i  =  -(S/  2  1).  •••,  -I.  1,  •••  .  or  .S'/ 2  for  the  other  Ringbus 
request  probabilities. 

3)  We  set  the  grant  duration  equal  to  d.  We  could  just  as  easily  allow  geometric  or  arbitrary 
discrete  probability  distributions  for  the  grant  distribution  provided  dial  the  Ringbus  model 
allowed  such  distributions.  We  assume  a  deterministic  grant  duration  for  simplicity  and 
because  observed  grant  durations  in  Concert  arc  very  nearly  deterministic  for  reads  and 
writes.  (See  section  3.3.2  of  Appendix  A.) 

The  Ringbus  model  can  now  be  solved  for  g  and  wpp  computed  from 

_ _  V _ _  « 

PO  _  d 

7 - +  WK II  h  « 

i  -no 


Finally,  we  can  obtain  l„pp. 


Note  that  because  of  our  approximation  of  the  distribution  of  by  a  discrete  geometric- 

distribution,  we  tally  need  lpl{,,,v.  as  an  input  to  die  Ringbus  model.  Recall  that 


_  ,  MHrq\  ,  j{  norm)  _  j  "At/liv/v 


In  '  !p  1  f/iKli  d  -  Ip  1  t  t start  +  l/ntrli  *  l(irt>  !  hrnns  '  d.  <p 


is  an  input  to  the 


p  'p  '  'tin  It  "  ~  <p 

Ringbus  model  and  tm„,  lltrb -  limns-  ;ll,d  d  are  constants  th.it  can  be  determined  by  empirical 
observations.  Such  observations  are  reported  in  Section  3.3  of  Appendix  A.  however,  actu¬ 

ally  depends  on  Jp>llr‘iv,  ilrnns ,  <7,  and  7^.  We  define  ll,,pli',t,“<k  as  the  interval  from  the  comple¬ 
tion  of  a  Ringbus  access  (from  the  point  of  view  of  the  single  processor  equivalent)  to  the  next  ris¬ 
ing  edge  of  the  arbiter  clock.  I  lius  //,# /f* /o<  *  -  d  -  /„f*  -t,nws. 


I  or  !*llirtiv  large,  /,$ ‘//n< lo,k  is  irrelevant  and  since  the  Multibus  model  (and  die  single  proces¬ 
sor  equivalent  model)  is  asynchronous  with  respect  to  the  arbiter  clock,  we  have  ,5c.  Ihis 

situation  is  depicted  in  f  igure  3.38. 
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'igurc  3.38:  Signals  when  t^nc<iV  large 


For  ijyHr(i'  small,  lcx k  becomes  important.  The  reason  for  this  is  that  the  Multibus 
model  (and  the  single  processor  equivalent  model)  may  generate  a  request  at  any  lime  after  the 
completion  of  a  Ringbus  access,  but  the  arbiter  cannot  accept  and  act  on  the  request  until  at  least 
the  next  rising  edge  of  the  arbiter  clock  after  die  previous  access.  (Of  cotiisc,  acceptance  of  the 
request  must  also  wait  until  after  the  previous  grant.  See  section  3.3.2  of  Appendix  A  for  details. 
In  die  figures  we  assume  lnrb  -0  for  clarity  of  presentation,  so  the  grant  terminates  at  the  next 
clock  edge  after  the  request  terminates.)  Thai  is,  In  fact,  for  i^,,cqv- 0,  a  condition 

that  can  occur  with  many  processors  on  a  Multibus  all  accessing  the  Ringbus.  we  have 
'latch  —  laRdil°il°'k-  Thus  .'/«,<•/,  can  vary  from  ft  to  c  (ignoring  setup  and  hold  times  on  the  arbiter 
input  devices).  Figure  3.39  depicts  an  example  with  rllltcll -,9e. 


Arbiter 


h  c  -*i 
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1l!iRH<Hlock  ^ a  Ringbus  access  is  followed  immediately  by  another  Ringbus  request 
.5c  otherwise 

Therefore  t,MKh ~.5c(l  -  PRn)  t  ltRn~CRH°c,0ck^^  /  Prb(3  -  W»  ~  ~>arb  ~-5c)  where  PKS  is  die 
probability  that  a  Ringbus  access  is  followed  immediately  by  a  Ringbus  request.  In  other  words, 
P/tH  is  the  probability  that  the  Multibus  queue  is  nonempty  at  the  termination  of  the  present 
access  and  that  the  next  request  in  the  Multibus  queue  is  for  the  Ringbus  given  that  the  present 
access  is  a  Ringbus  access.  Prr  can  be  determined  from  the- Multibus  model:  it  is  another  output 
of  the  single  processor  equivalent  model  of  the  Multibus. 

In  summary,  we  have  three  inputs  to  the  Ringbus  equivalent  model  from  the  single  processor 
equivalent  model:  p^,fc^v  (for  i  =  -(S/2-  1).  •••,—1,1,  •  •  •  .  or  S/2),  and  /**«•  In 

addition,  we  have  four  other  inputs  to  the  Ringbus  equivalent  model:  7arf,,  7trans ,  and  d. 

Note  that  only  means  arc  required  for  the  inputs  (except  for  and  I'rr)  to  the  Ringbus 

equivalent  model,  formally,  the  output  of  die  Ringbus  equivalent  model  is  the  probability  distri¬ 
bution  of  taRH-  However,  in  section  2.9.2  we  assumed  an  exponential  probability  distribution  for 
tuRn  in  the  single  processor  equivalent  model,  which  is  completely  characterized  by  i„rh.  ITius  we 
only  require  1oKr  as  an  output  of  the  Ringbus  equivalent  model. 
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3.10  Conclusions 

Conclusions  1  to  5  pertain  to  the  definition  of  the  Kingbus  given  in  section  3.1  and  the  vari¬ 
ous  assumptions  that  we  made.  These  assumptions  arc  listed  below: 

1)  even  number  of  slices 

2)  no  propagation  delays  or  mctasiability  settling  delays 

3)  mcmorylcss  i.c.  geometric  -  probability  distribution  for  nonnull  request  arrivals 

4)  symmetric  request  probabilities 

5)  no  global  registers 

6)  all  slices  identical  in  all  respects 

7)  all  probability  distributions  stationary  and  all  processes  in  steady-state 

8)  no  bound  on  request  waiting  time 

9)  deterministic  or  geometrically  distributed  grant  durations  of  an  integral  number  of  arbiter 
clock  periods 

10)  instantaneous  arbitration  time  (i.c.  no  arbitration  time) 

11)  no  start  up  lime,  no  end  time,  and  no  dead  time 


I. 


2. 


For  six  or  more  slices,  the  optimum  performance  of  die  Ringbus  is  difficult  to  determine 
and  analyze  -  because  of  die  large  number  of  states  -  even  with  all  the  simplifying  assump¬ 
tions. 


The  optimal  arbiter  algorithm  depends  strongly  on  the  request  probabilities;  no  one  arbiter 
algorithm  is  best.  In  addition,  die  optimum  pcrfonnancc  of  the  Ringbus  depends  very 
strongly  on  die  request  probabilities.  The  maximum  throughput  for  requests  of  length  one 


2  7 

is  between  —  .S'  and  -  S  (where  S  is  the  number  ot  slices)  and  the  maximum  throughput 

3  8 


for  requests  of  length  S/2  is  2.  A  first  order  approximation  of  die  dependence  of  the 
optimum  throughput  on  the  request  probabilities  is  given  by  ^  where  /  is  the  average 


request  length: 


J-S/S  '  -—L  +  S/2  PS/2 

iT\  (l  -i>o)  +  (1  /»o) 

(Note  that  1  is  part  of  the  upper  hound  on  g opt  developed  in  section  3.5. 1.1.) 


228 


Rin}>bus  Model 


3.  For  four  slices  the  optimal  arbiter  algorithm  always  grants  the  maximum  number  of 
requests  possible  in  every  state,  independent  of  die  request  probabilities.  For  six  or  more 
slices,  die  optimal  arbiter  algorithm  docs  not  always  grant  the  maximum  number  of  requests 
possible  in  every  state.  However,  for  six  slices  the  optimal  diroughput  is  not  degraded  signi¬ 
ficantly  for  light  to  medium  loading  by  restricting  die  algorithm  to  grant  the  maximum 
number  of  requests  in  every  state.  For  heavy  loading  with  mainly  very  short  and  very  long 
requests,  the  optimal  throughput  is  significantly  degraded  with  this  maximum  request  res¬ 
triction.  We  expect  dial  this  degradation  increases  with  die  number  of  slices. 

For  six  slices,  die  optimal  arbiter  algorithm  docs  not  always  grant  the  request  set  utilizing 

die  maximum  number  of  segments  possible  in  every  state  either,  although  the  maximum 

number  of  segments  decision  seemed  favoured  in  those  slates  in  which  die  maximum 
number  of  requests  decision  was  not  favoured.  For  all  request  probabilities,  the  optimal 
throughput  subject  to  the  maximum  number  of  segments  in  every  state  is  always  greater 
than  or  equal  to  the  optimal  throughput  subject  to  die  maximum  number  of  requests  res¬ 
triction  in  every  suite.  We  expect  diat  diis  result  also  holds  for  more  than  six  slices. 

A  reasonable  sub-optimal  arbiter  algoridim  for  six  slices  is  the  following: 

In  each  suite  select  a  request  subset  to  grant  by  choosing  arbitrarily  from  all  the 
request  subsets  in  a  slate  that: 

1.  utilize  the  maximum  number  of  segments 

2.  have  the  maximum  number  of  requests  subject  to  I,  and 

3.  have  die  maximum  number  of  longest  requests  subject  to  1  and  2. 

We  expect  that  this  algoridim  is  also  a  reasonable  sub-optimal  arbiter  algorithm  for  more 
than  six  slices. 

4.  For  deterministic  grant  durations  of  d>  I  rounds,  the  optimal  arbiter  algorithm  tends  to 
grant  requests  immediately  for  very  light  loading  (/> oKi)  and  tends  to  delay  and  align 
requests  so  that  they  can  be  granted  at  intervals  of  d  rounds  for  very  heavy  loading 

In  fact,  for  p$  0  the  optimal  arbiter  algorithm  is  die  optimal  interval  algoridim  -  i.c.  die 
optimal  algorithm  subject  to  the  restriction  that  requests  can  only  be  granted  at  intervals  of 
d  rounds.  The  optimal  interval  algorithm  is  die  same  as  the  optimal  algorithm  for  d  1  and 
the  equivalent  request  probabilities,  flic  optimal  algorithm  in  between  die  extremes 
and  p o*0  is  a  complex  function  of  the  request  probabilities  and  grant  duration  d.  I  r  four 
slices,  die  optimum  throughput  can  be  estimated  fairly  closely  by  the  exponential  approxi¬ 
mation  of  equation  3. 11),  which  depends  only  on  the  optimum  diroughput  for  d  ! .  We 
expect  that  equation  3.1')  also  yields  a  reasonable  approximation  to  the  optimum 
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throughput  for  more  than  four  slices. 

5.  The  performance  of  the  Concert  Ringbus can  he  improved  by  making  the  access  paths  sym¬ 
metrical  and  by  modifying  the  arbiter  algorithm.  Results  for  four  slices  suggest  that  when 
counterclockwise  requests  predominate,  the  greatest  improvement  in  performance  is 
achieved  by  making  the  access  paths  symmetrical,  and  when  long  requests  predominate,  the 
greatest  improvement  in  performance  is  achieved  by  modifying  die  arbiter  algorithm. 

The  performance  advantage  of  symmetrical  access  padis  over  asymmetrical  access  paths  is 
difficult  to  quantify  since  users  may  adapt  their  behaviour  to  suit  die  topology,  and  thus  the 
request  probabilities  may  change  with  the  topology. 

Symmetrical  access  paths  require  three  additional  set  of  drivers  per  slice  (see  figure  3.1)*^ 
and  a  more  complex  arbiter  since  arbitration  must  also  be  performed  for  request  destina¬ 
tions  (unlike  with  asymmetrical  access  padis).  As  discussed  in  section  3.1,  die  Concert 
Kingbus  arbiter  is  easily  modified  to  perform  this  arbitration  for  destinations  but  die 
number  of  parts  required  doubles. 

It  must  be  cautioned  that  modifying  die  arbiter  algorithm  may  not  improve  the  perfor¬ 
mance  to  the  degree  suggested  by  the  results  in  diis  chapter  since  we  have  ignored  two 
impoitant  issues.  These  arc  1)  the  realizability  of  die  optimum  arbiter  algoridim  in  a  reason¬ 
able  amount  of  hardware  and  2)  the  arbitration  time  required  by  a  realization.  The  arbitra¬ 
tion  time  obviously  degrades  performance  and  if  sufficiently  large,  it  may  negate  any  possi¬ 
ble  gain  in  performance.  We  have  also  ignored  die  practical  requirement  for  a  bounded 
request  waiting  time.  However,  provided  that  the  maximum  permissible  waiting  time  may 
be  sufficiently  large,  the  degradation  that  this  requirement  imposes  is  minimal. 

The  performance  of  the  Concert  Ringbus  arbiter  can  be  improved  by  either  of  two  trivial 
changes  (or  possibly  both)  to  the  arbiter  priority  ROM.  Results  for  four  slices  indicate 
these  changes  yield  only  minor  improvements  in  performance.  However,  the  magnitude  of 
diesc  improvements  should  increase  with  the  number  of  slices. 

6.  Since  a  crossbar  interconnection  has  the  best  performance  achievable  (where  the  intercon¬ 
nection  must  be  circuit-switched  with  S  sources  and  .S'  destinations)  and  is  popular  and  well 
known,  it  is  interesting  to  compare  the  Ringbus  and  a  crossbar  interconnection.  We  make 
such  a  comparison  on  the  next  page,  dividing  the  comparison  into  the  following  three  areas: 
performance,  hardware  costs,  and  arbitration  costs. 


t  One  scl  of  drivers  is  required  for  each  unidirectional  switch  and  iwo  sets  arc  required  fo'  each  bidirectional 
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Performance 

The  optimal  throughput  of  the  Kingbus  is  close  to  that  for  a  crossbar  when  cither  the  load¬ 
ing  is  light  or  short  requests  predominate  (or  both).  Otherwise,  the  optimal  throughput  of 
the  Ringbus  is  significantly  less  than  that  of  a  crossbar.  This  degradation  in  throughput  rela¬ 
tive  to  that  of  a  crossbar  is  especially  severe  in  heavy  loading  when  long  requests  predom¬ 
inate. 

Hardware  Costs 

To  connect  S  sources  to  S  destinations,  the  crossbar  interconnection  requires  A'2  drivers 
whereas  the  Symmetric  Ringbus  requires  6.V  drivers  and  the  Concert  Ringbus  requires  3.V 
drivers,  'flic  Ringbus  also  requires  more  hardware  for  arbitration  than  a  crossbar  docs,  but 
the  difference  is  difficult  to  quantify. 

Arbitration  Costs 

Arbitration  for  the  Ringbus  must  be  centralized  whereas  arbitration  for  a  crossbar  may  be 
distributed  amongst  the  destinations.  Consequently,  an  arbiter  for  the  Ringbus  -  especially 
an  optimal  arbiter  -  can  be  much  more  complex  than  an  arbiter  for  a  crossbar. 

Any  final  conclusion  in  comparing  the  Ringbus  and  crossbar  interconnections  (or  any  other 
interconnection)  depends  on  the  number  of  slices,  the  cxpccied  operating  point  (i.c.  die 
request  probabilities),  and  the  relative  importance  of  performance  versus  cost.  Certainly,  the 
Ringbus  seems  well  suited  for  predominantly  short  requests  and  unattractive  for  predom¬ 
inantly  long  requests. 

The  scalability  of  the  Ringbus  past  eight  or  so  slices  is  doubtful  because  of  the  complexities 
of  the  centralized  arbitration  and  control. 
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3.11  Suggestions  for  Future  Work 

'Hie  following  suggestions  arc  listed  in  order  of  perceived  importance. 

1.  Kxplorc  the  performance,  hardware  cost/arbitration  time,  and  maximum  waiting  time 
tradeoffs  of  various  algorithms  and  implementations  in  an  attempt  to  identify  an  ideal  arbitration 
algorithm  and  implementation.  At  least  investigate  various  implementations  for  optimal  or  near- 
optimal  arbitration  algorithms  (such  as  the  algorithm  mentioned  in  conclusion  3  of  section  3.10). 

2.  Remove  as  many  of  the  eleven  assumptions  listed  in  section  3.10  as  possible.  The  most 
important  assumption  to  remove  is  that  of  zero  dead  time.  In  the  Concert  Ringbus  arbiter,  as  in 
any  other  arbiter  implementation^,  there  must  be  at  least  one  round  between  successive  nonnull 
requests  in  order  to  identify  new  requests.  Other  factors,  such  as  the  minimum  processing  time  of 
processors  and  the  Ringbus  arbitration  time  (since  a  new  request  cannot  be  granted  until  after  the 
grant  from  the  previous  request,  delayed  by  the  arbitration  time,  terminates)  contribute  to  a 
nonzero  dead  time  in  practice.  We  feel  that  a  nonzero  dead  time  is  an  important  addition  to  make 
to  improve  the  accuracy  of  our  Ringbus  model,  especially  in  heavy  loading. 

Removal  of  assumptions  3  and  9  to  consider  arbitrary  nonnull  request  intcrarrival  time  and 
grant  duration  probability  distributions,  would  be  ideal.  Such  a  generalization  of  our  Ringbus 
model  would  not  only  lead  to  more  accurate  modeling  of  request  arrivals  and  grant  durations,  but 
also  allow  the  removal  of  other  assumptions.  As  discussed  in  section  3.9,  a  nonzero  arbitration  time 
can  be  treated  by  assuming  instantaneous  arbitration  and  suitably  apportioning  the  arbitration  time 
between  request  intcrarrival  time  and  grant  duration.  Any  start  up  time,  end  time,  or  propagation 
delays  can  be  treated  by  a  similar  apportioning  between  request  intcrarrival  time  and  grant  dura¬ 
tion.  Unfortunately,  arbitrary  request  intcrarrival  and  grant  duration  probability  distributions 
would  seem  to  make  the  Ringbus  unreasonably  difficult  to  analyze.  Hence  any  practical  generali¬ 
zation  in  this  direction  is  likely  to  be  just  an  extension  of  our  treatment  by  special  eases. 

It  would  be  worthwhile  to  consider  more  slices  in  die  Ringbus  model  but  the  large  number 
of  states  required  makes  an  exact  analysis  difficult  and  costly. 

Conceptually,  there  is  no  difficulty  in  removing  assumptions  1.4,5,  and  6  (see  list  of  assump¬ 
tions  in  section  3.10).  However,  there  is  the  practical  difficulty  that  the  analysis  becomes  compli¬ 
cated.  This  is  especially  true  for  the  removal  of  assumptions  4  and  6  since  die  symmetry  diat  we 
exploited  so  heavily  and  successfully  to  ease  die  analysis  will  not  exist.  It  would  probably  be  best 
to  have  a  specific  situation  in  mind  before  pursuing  the  removal  of  any  of  the  assumptions  1,  4.  5, 
and  6. 

^  In  making  Ibis  statement.  we  resume  thru  th-'  only  information  available  lo  the  arbiter  from  a  slice  is  whether 
or  not  a  request  is  present  ..ml  :l  so.  the  ursitii.-.iiun  of  the  request  Ibis  i.-.  the  only  information  available  to  the 
arbiter  in  Conceit 
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3.  Investigate  the  degree  to  which  the  performance  of  the  Ringbus  may  he  improved  by 
making  additional  information  -  such  as  the  number  and  type  of  requests  in  each  Multibus  queue 
and  the  waiting  time  so  far  of  each  request  -  available  to  the  Ringbus  arbiter. 

4.  Consider  other  metrics  for  the  performance  of  the  Ringbus  such  as  minimizing  die 
maximum  waiting  time  of  requests. 

5.  Kstablish  the  validity  of  the  conjecture  in  section  3.4.2.2  that  when  po=0  g£>l=gp=1, 
i.c.  when  po=0  the  optimal  average  number  of  grants  per  . round  with  deterministic  grant  dura¬ 
tions  of  d  rounds  equals  the  optimal  average  number  of  grants  per  round  with  grant  durations  of 
1  round,  assuming  the  nonnull  request  probabilities  arc  die  same  in  each  case. 
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Integration  and  Simulation 


4.1  Introduction 


In  this  chapter  wc  consider  the  integration  of  the  Multibus  submodel,  discussed  in  Chapter  2. 
and  the  Ringbus  submodel,  discussed  in  Chapter  3.  Wc  describe  the  results  of  the  integration  for 
a  few  example  cases  and  compare  these  results  to  those  obtained  via  simulation  of  the  overall 
Concert  model.  In  the  rest  of  the  chapter  we  present  and  discuss  tire  results  of  two  different  sets 
of  simulations  of  the  overall  Conceit  model  with  eight  slices.  The  purpose  of  the  first  set  is  to 
assess  the  performance  of  the  Ringbus  with  different  access  paths  and  arbiter  algorithms  and  to 
compare  this  performance  with  that  of  other  interconnection  architectures  in  an  environment  close 
to  dial  in  the  actual  Concert  system.  Such  a  comparison  would  be  too  computationally  expensive 
to  perform  by  solving  the  associated  Markovian  decision  problems.  The  purpose  of  the  second  set 
is  to  determine  the  expected  performance  of  the  actual  Concert  system  for  various  parameter 
values.  The  variables  considered  in  these  simulations  arc  the  number  of  processors  in  a  slice,  the 
mean  processing  time,  and  the  request  destination  probabilities. 


4.2  Integration 


Summari/ing  the  results  of  sections  2.9.2  and  3.9.1  wc  have: 


a)  The  Single  Processor  (equivalent  Model 


Input:  t0Kjj  (the  mean  Ringbus  access  time) 


Kxogenous  Inputs:  N  (the  number  of  processors  on  a  Multibus),  ip  (the  mean  processing 
time),  lr  (the  mean  recovery  time),  c,u/<  (the  mean  Multibus  access  time).  p,KI1  (the  Ringbus 
destination  probabilities),  ft  (the  probability  of  a  long  word  access),  and  (the  probability  of 
a  Ringbus  access). 


I ntctjr.il ion  mid  Simulation 

Outputs:  i>ji'Hc,tv  (the  mean  processing  lime  and  destination  probabilities,  rcspec- 

lively,  of  the  single  processor  equivalent  model  of  the  Multibus),  /’/,*/*  (the  conditional  pro¬ 
bability  dial,  given  a  Ringbus  access,  that  access  is  immediately  followed  by  another  Ringbus 
access) 

Computation:  pflH,yv-p!<H  for  /  -  -(.S'/  2-1).  •  •  • .  -  1,  1,  •  •  •  .or  .S'/  2 


where 


/H-  p  h 
7  A  At  - 1 
■'  2 


*T*2  (N  —k  ')\  A 


*T0(/V-A-l)f  I  A 

^  A”  / 

- — ~ .  and  /a  =(1  /  0X0  -  ^)laMB  +  ^aRB)- 
X  la 

/»  =_*£ - p*-< 

Kli  \~  +  +  4,tP 

.  I  /V  - 1  1 

where  f  -  — - .  and  p  -  — — - -X- 

V_JV1  _  ft 

A  <*-*>*  X 


b)  The  Ringbus  topi  i  valent  Model 
Inputs  .i^p^J'RB 

Kxogenous  Inputs:  .S'  (the  number  of  slices),  l:Mr,  (the  mean  start  up  overhead),  7liro  (the 
mean  Ringbus  arbitration  lime),  7[rwa  (the  mean  Ringbus  data  transfer  lime),  d  (the  mean 
duration  for  which  Ringbus  segments  arc  allocated  to  a  request  and  related  to  I  trims  by 

d  =  farb  h  — -  <•).  c  (the  Ringbus  arbiter  clock  period),  type  of  Ringbus  access  paths,  and 
c 

the  Ringbus  arbitration  algorithm. 

Output:  1„rr 

Computation:  7aRR  -  7^R0Rn)  f-  wHfi  where  7$°™*-  I  start  +  I  latch  +  larO  +  limns- 
llntcli  ~.5c  V-  rRR(d  limns  ~  l«rb  ~  •->c  ) 


•V.iVi 
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w/tn  is  determined  from 


S 


P  0  „ 

- - /  WKH  1  d 

1  -  />0 


s 

d 


g,  die  average  number  of  new  or  continuing  grants  per  round,  is  found  by  solving  the 


Kingbus  model  with  parameters  pt 


Pi 


MHctpi 


and  po  where 


c  PO  ~M llcqv  -( norm )  7 

1 — —  ~tp  +(aRH  -a- 

I  ~P0 


This  is  an  approximation  -  see  the  footnote  in  section  2.9.2.  Assuming  all  the  quantities  are 

i:l'Ht,lv+itiT,)-d ,  i n2m«v+7„„„>j  -I 


t'7'0 

deterministic,  we  have  —  — 

l  Pa 


p  ’  ‘OKU  "  >  11  '/> 

'arb  +  c  •  otherwise 


'smrl  Z- u  'irons 


l  or  a  given  set  of  exogenous  inputs  in  (a)  and  (b),  integration  consists  of  matching  the  input 
in  (a)  with  the  output  in  (b)  and  matching  the  outputs  in  (a)  with  die  inputs  in  (b).  This  can  be 

done  iteratively,  as  outlined  in  the  following  steps.  The  subscripts  k  on  i„rh-  iplllrqv-  and  p/,/w“,v 
denote  successive  estimates  of  the  true  values  of  these  respective  quantities. 

1)  k  «-0.  Assume  some  initial  value  for  1„rh\  denote  it  by  (Thru) 0- 

2)  Using  ( tan h )k ,  determine  (ip,,k',v)k  and  (p?l,fa,v)k  for  the  single  processor  model  of  the 
Multibus. 

3)  Using  (ip,licilv)k  and  {p^lkl,Y)k .  determine  U„Rn)k  +  l  h»r  the  kingbus  equivalent  model. 

4)  k*-k  f  1.  If  the  estimates  of  („rh.  and  p,^lliyv  urc  salisftictory,  stop.  Otherwise  go  to 

step  2. 

Ihe  iteration  can  begin  instead  by  assuming  some  initial  values  for  and  and 

then  estimating  Note  that  since  we  employ  various  approximations  in  obtaining  the 

equivalent  models  of  the  Multibus  and  die  Kingbus  (principally  approximating  (he  interaction 
between  the  two  models  by  first  moments),  die  final  estimates  for  i„p n  and  lp,lk,,v.  will  not  neces¬ 
sarily  equal  their  true  values.  We  did  not  investigate  the  convergence  properties  of  the  above  itera¬ 
tive  procedure.  However,  we  ibund  that  the  estimates  converged  rapidly  whenever  we  used  it. 

There  arc  two  eases  for  which  die  Multibus  and  kingbus  models  can  he  integrated  without 
resorting  to  iteration: 
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Case  I:  Very  light  Kingbus  tralTic 


This  ease  can  arise  in  two  ways: 

lp  +  P7r 

i)  -  -  large,  ip  irrelevant,  or 

N 

7P  ^P7, 

ii)  ip  small  (i.c.  <p~0),  — -----  irrelevant 


"ITie  first  way  corresponds  to  very  light  utilization  of  the  Multibus,  which  leads  to  very 
light  utilization  of  die  Ringbus  regardless  of  ip.  I  he  second  way  corresponds  to  very 
minor  coupling  between  the  Multibus  and  Kingbus,  which  leads  to  very  light  utilization 
of  the  Kingbus  regardless  of  the  utilization  of  die  Multibus.  Of  course,  very  light 
Kingbus  traffic  can  be  achieved  both  ways  simultaneously.  However,  in  our  treatment 
below  we  choose  to  consider  c;ich  way  as  a  distinct  subcase. 


Case  l(i): 
For 


Ip  >  P'r 

~n'~ 

7p+P[r 

"  I V 


large,  ip  irrelevant 

sufficiently  large,  we  have  /w~0, 


pN  ~0  (and 


hence 


llah  li  "•5c  ). 


rMHuj  r„ 
Ip 


Ip  l~  fib 

\T+p)Nr 


ivki;  ~0,  and 


+  -5<'  I  '<»b  !  <in.ns ■  ’I'hu*  lp,HeqV  ;md  i(lRH  and  all  the  other  quanti¬ 
ties  of  interest  can  be  found  without  resorting  to  iteration. 


Case  l(ii):  ip  small  (i.c.  ^**0)  and  - 


/  ft  lr 
N 


irrelevant 


In  this  case  i^Re<r’  js  very  large.  I’rh'-O  (and  hence  «.5c).  wKn  ~0,  and 
7aHH  * >2! hum  +  -5c  f  Lo  +  7, mm -  'w  "lay  be  computed  by  talcing 
7U  ®( 1  f  P)Tn\m-  Note  that  it  is  the  possibly  large  value  of  Jw  that  differentiates  this 
subcase  from  the  previous  one. 


Case  2:  Very  heavy  Kingbus  traffic 

7P  +plr  7P 

I  his  ease  arises  when  both  - - is  small  and  ip  is  large  (i.e.  ip~  1).  For - 

N  b  T  N 

sufficiently  small,  the  Multibus  is  saturated  (i.c.  N»N*  where  /V*  is  the  saturation 

point  of  the  Multibus)  and  hence  /**(#  -  /V*)/a,  yielding  i^nc<r>~7aMn  -  -  as  in 

* 

section  2.9.2.  In  addition  pN  '  '«1  and  thus  P» r ~ - .  With  ^~1,  /,;uw,,/v»0 

1  i  i  H  ' 

and  Pr/i~1  hence  l,a, ,./,*</  -  iarb  -  iln,„s.  Therefore  /,n,v  is  a  constant. 
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Assuming  d,  lnrt, ,  and  /,nwM  arc  deterministic  random  variables 

^  ^  t.vfcjr/  ^  ^  I  tut  *  ‘traits  ~  d  —  I  atari  .  •!  I  atari 

itlrb  +  <'  -  otherwise. 

Once  po  and  the  corresponding  g  arc  detennined.  wrh  is  given  by 

_ _ A _ _  g 

»»»><? 

finally,  --Islarl  9  t bitch  ^  7irb  hmns  WRH  ■  (tfurt  ^  ^  f  I3'US  ty,  <llld 

/(jA’/t  *>nd  ;|I1  the  other  quantities  of  interest  can  be  found  without  resorting  to  iteration. 

Note  dial  for  small  enough  --  -  -  —  and  ip  close  enough  to  I,  7*1llr‘lv~0  regardless  of 

the  various  probability  distributions  in  the  Multibus  and  Kingbus  models.  In  this  case 
the  probability  distribution  of  t*,n,,<v  is  given  very  accurately  by  ipll<u,v.  (It  in  fact 

becomes  exact  for  4*  - 1  as  — --*0  since  ipiHu,v -*().)  lienee  our  first  moment 

t-  ft  i  f 

approximation  of  the  Multibus  to  Kingbus  interaction  is  very  accurate  for  — ~ — 
small  and 


jRBeqv 

'p 


We  now  consider  some  example  eases.  In  each  ease  we  determine  1  and  for  the 
Multibus  model  and  /(J/y/y  and  *v#/y  for  die  Kingbus  model. 

All  the  simulations  reported  in  this  section  and  in  this  chapter  arc  simulations  of  the  overall 
Concert  model.  As  discussed  in  section  1.3.5,  this  overall  model  is  comprised  of  a  model  for  each 
Multibus  and  a  model  for  the  Kingbus.  As  a  model  for  each  Multibus  we  choose  the  Multibus 
model  with  long  word  and  Kingbus  accesses  discussed  in  section  2.9.  As  a  model  for  die  Kingbus, 
we  choose  the  basic  model  discussed  in  section  3.1.  I  bis  model  depends,  of  course,  on  the  particu¬ 
lar  arbitration  algoridim  and  access  paths  desired.  In  simulating  the  overall  Concert  model,  we 
simulate  each  Multibus  model,  die  Kingbus  model,  and  the  interaction  between  Multibus  models 
and  the  Kingbus  model.  Since  our  Multibus  models  arc  continuous  time  models,  we  simulate  diem 
in  continuous  time  and  since  our  Kingbus  models  are  discrete  time  models,  we  simulate  them  in 
discrete  time.  We  simulate  the  Multibus  models  as  operating  asynchronously  with  respect  to  the 
Kingbus  model:  thus  our  simulations  include  the  effect  of  synchronizing  the  Multibus  signals  with 
the  Kingbus  arbiter  clock.  I  lie  parameters  of  our  simulations  are  as  follows: 
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Multibus  model: 

-  llic  number  of  processors  on  a  Multibus,  .V. 

-  the  processing  time  distribution,  with  mean  Tp. 

-  the  recovery  time  distribution,  with  mean  7r. 

-  the  Multibus  access  time  distribution,  with  mean  laMH- 

-  the  probability  of  a  long  word  access,  /?. 

-  die  probability  of  a  Kingbus  access,  tp. 

-  the  Kingbus  destination  probabilities,  pRn. 

Kingbus  model: 

the  number  of  slices,  .S’. 

-  the  arbiter  algoridim. 

-  the  Kingbus  access  paths. 

-  the  arbiter  clock  period,  t*. 

-  the  start  up  overhead.  lmrt .  (  Taken  as  a  constant.) 

-  the  Kingbus  arbitration  time.  i]irt, .  (  Taken  as  a  constant  and  an  integral  multiple  of  r.) 

-  the  probability  distribution  of  the  Kingbus  data  transfer  time,  witli  mean  tmm.  (Note  that 
the  duration,  </  for  which  segments  arc  allocated  to  a  Kingbus  request  is  related  to  llm„s  by 

,  hrum  . 

d  ~  Uirb  ^ .) 

C 

Other: 

-  the  block  size,  II.  lutch  simulation  was  run  until  processor  1  on  Multibus  1  (this  numbering 
is  arbitrary)  completed  30  v-  II  processor  cycles  (i.c.  processing  time,  waiting,  access  lime  for  word 
or  long  word).  To  remove  the  effect  of  transients,  statistic  gathering  did  not  begin  until  processor  1 
on  Multibus  1  completed  30  accesses. 

-  the  number  of  block  repetitions.  R.  The  statistics  reported  are  based  on  R  repetitions  of 


each  simulation. 
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Wc  remind  die  reader  of  our  basic  assumptions,  which  apply  to  our  simulations  as  well: 

•  Wc  assume  tli.il  all  the  random  variables  ip,  ir,  i„mh,  and  tlnMS  and  all  the  probabilities  ft , 

D  U 

if/,  and  Pi  arc  mutually  independent  and  stationary. 

•  Wc  assume  each  Multibus  model  has  the  exactly  die  same  parameters,  so  all  Multibus 
models  arc  identical  in  every  respect. 

•  Wc  assume  that  the  Ringbus  model  is  completely  symmetric  with  respect  to  each  Multibus 
interconnection. 

In  all  our  simulations  wc  assume  in  addition  that: 

1)  the  processing  lime  is  exponentially  distributed. 

2)  the  recovery  time  is  deterministic,  and 

3)  the  Multibus  access  time  is  deterministic. 

Wc  caution  that  the  following  examples  were  chosen  for  purposes  of  illustration.  They  do 
iiot  represent  die  actual  Concert  system  and  dicy  do  not  represent  an  in-depth  study  or  analysis  of 
integration.  In  each  ease  wc  assume  dial  the  Ringbus  arbiter  has  zero  arbitration  time  (i.c. 
i<irh  -0)  and  that  there  are  no  long  word  accesses  (i.e.  ft  0.  and  lienee  wc  take  7r  -0  and  y  0). 
In  addition,  wc  Uike  tmr,  ~0  and  7ump  -1.0c. 


Kxaniplc  1:  lp  -■  1  .Oc ,  .S'  -4,  deterministic  grant  duration  of  one  round  i.c.  J  c.  optimal  arbiter 

for  deterministic  grant  duration  of  one  round,  symmetrical  access  paths.  pRJ{  -  p\H  -  A,  and 

-Rll  ...  •> 

Pi 


Table  4.1  presents  the  integration  and  simulation  results  for  various  values  of  N ,  *p.  and 
7trans  (which  is  a  deterministic  value  here). 
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Jlriirt. r^ 

0.01 


]MH<yV/c 


Integration  3.00 

Simulation  2.95±.2U 

Integration  3.00 

Simulation  3.00±.  16 

Integration  1.00 

Simulation  l.00±.06 

Integration  1.00 

Simulation  l.00±.03 

Integration  1.53 

Simulation  l.40±.  09 

Integration  1.44 

Simulation  1.34±. 03 

Integration  0.22 

Simulation  l).17±.  01 

Integration  0.19 

Simulation  3.13±.01 

Integration  1.02 

Simulation  1.00±.05 


Integration  0.01 

Simulation  [).00±.001 

Integration  1.00 

Simulation  l.00±.04 

Integration  0.00 

Simulation  3.00±.001 


0.0 

0.0 

~ao 

0.0 

0.42 

0.40±.02 

0.69 

0.54±.02 

0.73 

0.70±.01 

0.99 
0.84  ±.04 

2.76 
2.41  ±.05 

3.55 
3.5 1  ±.07 

5d7 

4.67±.01 

6.58 
6.54  ±.06 


l!i:l,h/c  t„RH/c 

0.50  0.64  " 
[).54±.01  0.68 1 

0.50  1.55 

[).53±.02  1.60 

0,50  0.87 
0.58  ±.01  0.94 

0.50  1.74 

0.58±.O2  1.73 

0.62  0/78 
3.77  ±. 01  1.02 

0.37  1.50 

3.29±.0I  1.46 

0.78  1.30 

).88±.01  1.38 

0.21  1.60 
0.18±.l)l  1.59 

0.27  1.48 

).03±.01  1.28 

0.03  1.51 

).0.7±.04  1.50 

0.26  1.47 

1.0 ’±.00 1  1.27 

0.02  1.52 

).02±.001  1.51 


"RH*- 

R 

0.13 

1.10 

0.1 3  ±.03 

1 . 1 0±  .06 

0.07 

0.88 

0.09  ±.07. 

0.87  ±.03 

0.36 

2.14 

0.35  ±.01 

2.05  ±.07 

0.26 

1.46 

0. 1 7±  .02 

1.47  ±.02 

0.16 

1.73 

0.24  ±.02 

1.64±.06 

0.15 

1.36 

0.1 9  ±.03 

1.43  ±.03 

0.51 

2.64 

0.49  ±.01 

2.58±.04 

0.42 

2.23 

TT 

o 

+i 

§ 

2.32±.02 

0.23 

1.60 

0.27  ±.03 

1.75±.03 

0.50 

2.64 

0.50±.02 

2.66±.03 

0.23 

1.62 

0.27  +  .0I 

1.76±.03 

0.52 

2.64 

0.51±.0I 

2.64  ±.02 

Table  4.1:  ip  =  1.0c,  S  -4,  deterministic  grant  duration  of  one  round  i.c.  d  ~c ,  optimal  arbiter  for 

o  l»  ou  ft  U  „ 

deterministic  grant  duration  of  one  round,  symmetrical  access  paths,  p  _  i  -/>  |  -.4,  and  pi  -.2. 


In  general,  the  integration  and  simulation  results  agree  rather  closely.  The  results  arc  closest 
for  light  Kingbus  loading  (N  -  l.  \p  -  .5)  and  very  heavy  Kingbus  loading  (N -4  and  6,  ^-1.0). 
This  is  to  be  expected  since 

1)  the  analytical  formulae  describing  the  Multibus  model  arc  the  most  accurate  for  general 

t  laRH  was  ll0t  onc  °f  the  statistics  fathered  by  the  simulations.  Iti  each  row  corresponding  to  a  simulation  in 
this  table  and  in  the  other  tables.  IfiRII  'V3S  computed  fron.  the  relation 
taRII  ~  htart  ±  f latch  ±  WR II  *  hrans  wherc  tsuin  ~ 


■.■.v.v.v.'a  v  *.*  /vVvV>W.>\-  -„■> -  .n  . v  .%  • 

f.v.V.v.v.Wv^  v  y  v  vVv"vWvVv'vyXvs,'’v\-'.\>'  .SV"*  ."*.**  ,s  .\  . 
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probability  distributions  for  N«N *  and  N»N‘  where  N'  is  the  Multibus  saturation 
point,  and 

2)  the  first  moment  approximation  of  the  Multibus-Ringbus  interaction  is  fairly  accurate  for  light 
Ringbus  loading  (and  N  - 1)  and  heavy  Ringbus  loading  (and  N»n’).  In  the  first  ease, 
since  Ringbus  traffic  is  light.  wK„»0  and  5c .  Thus  7(lK„~isuir,  f.5c  /  llirb  +  7lrims . 

Since  N  =  \,  llic  Multibus  queue  (in  the  Multibus  model)  is  quasi-rcvcrsiblc  regardless  of  the 
Multibus  and  Ringbus  access  lime  distributions  and  tlius  the  analytical  formulae  of  die  Mul¬ 
tibus  arc  exact  with  only  the  mean  Ringbus  access  time.  i(Irr.  In  the  second  ease,  both  the 
Multibus  and  Ringbus  arc  saturated.  In  saturation  only  the  means  of  die  various  quantities 
arc  required  to  determine  lw,  wr h ,  and  ltlR r . 

Note  that  light  Ringbus  loading  and  very  heavy  Ringbus  loading  are  two  eases  -  as  discussed  ear¬ 
lier  -  for  which  integration  can  be  performed  without  iteration. 

The  results  for  various  values  of  Ttm„s  (for  N  - 1  and  2)  arc  presented  in  Table  4.1  to  deter¬ 
mine  the  effect  of  l,rans  on  the  accuracy  of  the  integration  results.  In  all  of  the  Ringbus  models 
that  we  investigated  in  detail  in  Chapter  3  (i.c.  die  models  in  section  3.3.  3.4.  3.5,  and  3.8)  -  includ¬ 
ing  die  optimal  arbiter  with  a  deterministic  grant  duration  of  one  round,  as  in  Fxamplo  I  -  we 
assumed  that  (he  probability,  po.  of  a  null  Ringbus  request  was  independent  of  all  other  icqucsts 
on  the  Ringbus.  However,  the  probability  of  a  null  request  at  the  Ringbus  in  our  Concert  model 
can  depend  on  the  previous  requests  at  the  Ringbus.  The  reason  is  as  follows. 

First  we  introduce  some  terminology.  We  term  a  request  latched  by  the  Ringbus  arbiter  a 
latched  Ringbus  request  or  a  I.RR  request  for  short.  In  addition,  we  call  the  arrival  of  a  nonnull 
Ringbus  request  from  a  Multibus  an  arrival  event.  Now.  if  die  previous  I.RR  request  at  a  slice  is  a 
null  request  dicn  die  next  I.RR  request  at  dial  slice  will  also  be  a  null  request  if  there  is  no  arrival 
event  at  that  slice  in  the  arbiter  clock  period  following  the  latching  of  die  previous  null  request. 
On  the  other  hand,  if  the  previous  I.RR  request  at  a  slice  is  a  nonnull  request,  then  the  next  I.RR 
request  at  that  slice  will  be  a  null  request  if  dicrc  is  no  arrival  event  at  that  slice  in  the  interval 
between  the  termination  of  die  Ringbus  access  (the  data  transfer,  not  the  interval  for  which  seg¬ 
ments  are  allocated)  of  the  previous  I.RR  request  and  the  next  latching  instant.  These  two  situa¬ 
tions  arc  depicted  in  Figure  4.10.  (Remember  that  lsllirl  0  and  tar[,  -  0  here.) 


242 


Integration  and  Simulation 


H-  c- H 


No  nonnutl  Kingbus  rcqucsl 
can  ariivc  in  Ihis  inlcrv.il — 
(Ironi  Ihis  slice) 


t* —  c  — *1 


Figure  4.1:  Two  situations  leading  to  a  null  request 

Thus  a  null  Kingbus  request  follows  a  null  Kingbus  request  if  no  Kingbus  request  arrives 
from  the  Multibus  in  an  interval  c  and  a  null  Kingbus  request  follows  a  nonnull  Kingbus  request 
if  no  Kingbus  request  arrives  from  the  Multibus  in  an  interval  </  qm,/v  <t .  Note  that  if  llnws  J. 
then  a  null  Kingbus  request  must  follow  every  nonnull  Kingbus  request.  To  avoid  this  -  since  die 
Kingbus  model  in  this  example  (and  all  ibe  other  examples)  does  no!  incorporate  a  null  request 
alter  every  nctmull  request  -  we  take  tlrans  ~<l  -e  for  some  constant  c.  ()<r  <c  in  all  ihe  eases  in 
this  section. 

If /V  1,  then  with  our  assumption  of  exponential  processing  time,  die  probability  of  a  null 
I.KB  request  following  a  null  I  KK  request  is  proportional  to  c  and  the  probability  of  a  null  I  KB 
request  following  a  nonnull  I  KB  is  proportional  to  J  (l,runs  and  J  arc  deterministic  in  this 

example.)  By  taking  i,mns  very  small  (.Ole),  we  minimize  die  dependency  of  the  probability  of  a 
null  I  .KB  request  on  the  previous  I  .KB  request  at  die  same  slice.  By  taking  i,roni  large  (.98e)  we 
incieasc  this  dependency. 

If  N  is  large  and  ip  large  so  dial  the  Multibus  queue  is  nearly  always  nonempty  with 
Kingbus  requests,  dicn  ip,l1u,r~0  and  the  likelihood  of  a  nomuill  Kingbus  request  arriving  in  c  or 
(1  is  about  die  same  (as  long  as  <1  > t,mm ).  Thus  the  interval  < /  ilrans  has  small  effect  on 

the  probability  of  a  null  I  KB  request  for  large  N  and  i£~I.O. 

Thus  we  expect  the  probability  of  a  null  I  KB  request  to  depend  quite  heavily  on  </  -ltnm 
for  light  Kingbus  traffic  and  diminish  as  the  Kingbus  traffic  increases.  As  we  stated  earlier,  die 
Kingbus  model  used  in  the  integration  docs  not  incorporate  die  dependency  of  null  request  proba¬ 
bilities  on  J  It  might  be  expected  that  the  integration  results  v.ould  be  most  accurate  when 

(l -limns  is  adjusted  to  reduce  this  dependency.  Indeed,  this  docs  seem  to  he  the  ease  for  /V  I 
and  1-0:  the  integration  and  simulation  results  for  ilnm  ,01c  are  closer  than  those  for 
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Kxample  3:  lp  =  1 ,0c ,  S  4.  geometrically  distributed  grant  duration  with  mean  d  4c,  optimal 

arbiter  for  geometric  duration  with  mean  <7  -  4c.  symmetrical  access  paths,  pKI{  pf l! -.4,  and 

„RI<  _ 

Pl  -.2. 


The  integration  and  simulation  results  arc  shown  in  Table  4.3.  In  obtaining  these  results  we 
took  I  trims  ~d —.02c,  hence  //„,<./,  -,02c  for  heavy  Ringbus  loading.  Note  tliat  once  again  the 
integration  and  simulation  results  agree  rather  closely. 


N 

r* 

jWuqV/c 

tw/c 

Oatd/C 

wK„/c 

K 

1 

n  ^ 

Integration 

3.00 

0.0 

0.50 

6.58 

2.10 

1.67 

] 

U.J 

Simulation 

2.95±,40 

0.0 

0.51  ±.03 

6.50 

2.01  ±.35 

1 ,66±  .09 

Integration 

1.00 

0.0 

0.50 

7.21 

2.73 

1.95 

1 

I.U 

Simulation 

I.()()±.I0 

0.0 

0.57±.06 

7.20 

2.65  ±.44 

1.92  ±.04 

9 

0  s 

Integration 

1.20 

3.20 

0.31 

7.01 

2.72 

1.95 

Simulation 

1 . 1 8± .  1 2 

3.22±,26 

0.27 

7.06 

2.81  ±.53 

1 .9  4± .  14 

Integration 

0.06 

6.43 

0.08 

7.31 

3.26 

2.17 

l 

I.U 

Simulation 

0.03  ±.  01 

6.28  ±.78 

0.07  ±.  01 

7.30 

3.25±.57 

2. 16±.  12 

Integration 

1.00 

11.12 

0.26 

7.08 

2.84 

1.98 

U.j 

Simulation 

1.03  ±.08 

I0.86±,74 

0.20±.01 

7.11  . 

2.93±,33 

2. 02  ±.09 

i  n 

Integration 

0.00 

20.92 

0.02 

7.31 

3.31 

2.19 

4 

l.U 

Simulation 

O.OOi.OI 

20.99±,82 

0.l)2±.01 

7.36 

3.36±.I2 

2.17±.03 

Table  4.3:  /--I. Of,  .V---4,  geometrically  distributed  grant  duration  with  mean  d  -4c,  optimal 

'  *  —  n  ii  o  u 

arbiter  for  geometric  duration  with  mean  d  -4c,  symmetrical  access  paths,  p  \  ~p\  -.4,  and 

„RR  -  ? 

Pl  -.2. 


Kxample  4:  .V--4,  deterministic  grant  duration  of  one  round  i.e.  </-e,  optimal  arbiter  for  deter¬ 
ministic  grant  duration  of  one  round,  symmetrical  access  paths,  p  \~P\--A.  Pi~-.2,  and 
hruna  —  -98*  (i.e.  (trans  deterministic). 


This  example  is  the  same  as  Kxample  1  except  for  the  value  of  ip.  The  object  of  this  example 
is  to  examine  the  accuracy  of  the  integration  results  when  the  Multibus  is  operating  in  the  knee 
region  i.e.  for  N~n" .  We  have  already  seen  in  the  previous  examples  and  have  discussed  that  the 
integration  results  arc  the  most  accurate  for  light  Ringbus  loading  and  very  heavy  Ringbus  load¬ 
ing. 


We  attempted  to  keep 


^/j  '  ft  *r 


On 


(I  /  /?)((!  'I'Vu.UH  <  'pt„Rli)  approximately  equal  to  5. 
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(This  corresponds  to  a~5  in  the  M/M/1//N  model  discussed  in  section  2.4.)  l-or  this  value  of 


K—,n' 


is  approximately  6;  thus  we  consider  ,V  -2,  4.  6.  and  8.  l-'or  if*-. 5.  we  took  tp  -6.0c 


and  for  \p-  1.0  we  took  t/:  -7.5c.  (Recall  that  i„mh  -0  and  fi-0.)  The  corresponding  integration 
and  simulation  results  arc  shown  in  Table  4.4. 


N 

T 

1* 

7w/c 

hutch  Sc 

■  _ 

'aRli/c 

wHH/c 

g 

2 

0.5 

0.98 

integration 

Simulation 

6.05 
5.81  ±.48 

0.26 

0. 1 3  ±  .0 1 

0.46 
0.46  ±.46 

1  50 

1.50 

0.06 

0.06±.0I 

0.53 

0.55±.55 

4 

0.5 

0.98 

Integration 

Simulation 

2.58 

2.42±.07 

0.79 

0.52±.02 

0.39 

0.33±.01 

1.42 

1.43 

0.06 

0.I2±.0I 

1.00 

1.04  ±.02 

6 

0.5 

0.98 

Integration 

Simulation 

1.54 

1.45±.03 

1.85 

1.23±.02 

0.33 

0.19±.0I 

1.49 

1.36 

0.19 

0.19±.01 

1.32 

1.42  ±.02 

8 

0.5 

0.98 

Integration 

Simulation 

1.16 

1 . 10±  .02 

‘  3.41 
2.44±.06 

0.29 

0.08  ±.01 

1.51 

1.31 

0.24 

0.25  ±.01 

1.50 

1.66  ±.01 

2 

1.0 

0.98 

Integration 

Simulation 

3.12 
3.08  ±.13 

0.26 
0.1 5  ±.02 

0.42 
0.43  ±.01 

1.54 

1.51 

0.14 

0. 1 0±  .02 

0.86 

0.87  ±.02 

4 

1.0 

0.98 

Integration 

Simulation 

1.00 
0.91  ±.04 

0.96 

0.65  ±.03 

0.28 

0.27  Jh  .0 1 

1.49 

1.50 

0.23 

0.25±.0l 

1.61 

1.66  ±.03 

6 

1.0 

0.98 

Integration 

Simulation 

0.35 
0.27  ±.  01 

2.19 
1.75  ±.06 

0.15 

0.14±.0! 

wn  cr. 

0.39 
0.41  ±.01 

2.14 

2.23±.01 

8 

1.0 

0.98 

Integration 

Simulation 

Oil 

0.06  ±.  01 

4.04 

3.63  ±.06 

0.08 

0.06  ±.01 

1.53 

1.53 

_ 

0.47 

0.49±,01 

2.45 
2.5  3  ±.02 

Table  4.4:  5  -4,  deterministic  grant  duration  of  one  round  i.c.  d  optimal  arbiter  for  deter¬ 
ministic  grant  duration  of  one  round,  symmetrical  access  paths,  />_[-/>] -.4,  pi--  .2 .  and 

l,m„s  =.98c  (i.c.  deterministic). 


In  every  case  listed  in  Table  4.4,  tw' 


iniwaiiun  ^  ~  simulation 

^  ‘  VO 


.  (The  superscripts  denote  how  the 


quantities  were  obtained.)  This  is  not  surprising,  especially  for  ip -.5  since  the  access  times  have  a 
large  deterministic  component.  We  have  already  seen  in  Chapter  2  that  the  Multibus  model  with  a 
server-sharing  queue  overestimates  7W  if  the  access  lime  distributions  arc  deterministic.  Here,  the 
Multibus  access  time  distribution  is  entirely  deterministic  and  the  Ringbus  access  time  has  a  large 
deterministic  component:  the  Ringbus  access  time  is  at  least  t,r„„s ,  where  tirans  is  deterministic. 

In  addition,  in,v*ration  >  and  Ration  ^tmuhaion  jn  every  ease 

listed.  These  arc  obviously  related.  A  larger  value  of  ipIH>  1/1  implies  a  smaller  value  of  />o  and 
hence  a  smaller  value  of  g.  Also,  inl‘  y~,:,npn  js  iclalcd  to  /„ 


inh^riinon 


.  If  C 


integration 


is  larger 
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tlian  it  should  be.  then  in,l'Krat'on  js  likely  to  be  larger  than  it  should  be.  In  addition,  it  is 

likely  dial  the  probability  distribution  of  tpl<Ci*v  is  skewed  more  towards  shorter  times  than  dial 
predicted  by  our  geometric  approximation  of  it.  'Iliis  would  cause  the  Ringbus  to  be  more  heavily 
loaded  in  actuality  -  i.c.  in  the  simulation  -  dian  predicted  by  integration;  hence  the  actual 
throughput  of  the  Ringbus  would  be  greater  dian  predicted  by  integration.  Since  the  probability 
distribution  of  lpl,cqv  would  be  more  skewed  towards  shorter  times  for  larger  1 V.  this  effect  might 
explain  why  die  difference  gximulatlon  -  g inlc&mlwn  increases  with  N . 

Discussion 

The  results  predicted  by  integration  of  the  Multibus  and  Ringbus  models  agree  fairly  closely 
with  simulation  results  of  the  overall  Concert  model  for  die  four  examples  considered.  We 
observed  that  the  integration  results  were  most  accurate  for  light  Ringbus  loading  {N  «N  ,  small 
ip)  and  very  heavy  Ringbus  loading  (N»N\  ip~\).  This  is  in  fact  a  general  result  for  integra¬ 
tion,  as  we  discussed  earlier,  and  can  be  justified  analytically.  We  performed  the  integration  for 
several  other  examples  with  .V--4  and  observed  the  same  general  trends  as  in  the  four  examples 
reported.  We  did  not  perform  any  integration  for  .Y>4,  for  which  we  expect  the  same  general 
trends. 

The  accuracy  of  the  integration  results  in  die  knee  areas  (i.c.  for  N~N  )  will  depend 
strongly  on  d.c  various  probability  distributions,  as  we  saw  with  the  Multibus  models  in  Chapter  2. 
A  great  deal  of  further  work  is  rcquiicd  to  clarify  and  characterize  the  accuracy  of  our  integration 
technique  in  the  knee  area. 

Certainly,  our  four  examples  demonstrate  that  our  integration  technique  works  and  that  it  is 
a  viable  approach  if  accuracy  is  not  paramount.  If  greater  accuracy  is  desired  from  die  integration, 
then  the  interactions  between  the  Multibus  and  Ringbus  models  will  have  to  be  approximated  by 
more  than  just  first  moments.  However,  diis  will  be  difficult,  and  probably  infeasible,  in  most  eases 
when  dealing  with  analytical  models  for  the  Multibus  and  Ringbus. 
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4..)  Simulation  I:  The  Kingbus  in  (lie  Concert  K.nvironinent 

In  this  section  we  present  and  discuss  the  results  of  a  series  of  simulations  to  assess  the  per¬ 
formance  of  die  Itinglms  -  with  eight  slices  -  in  the  Concert  environment. 

We  have  already  discussed  our  simulation  model  and  its  parameters  in  conjunction  with  the 
simulations  reported  in  section  4.2.  To  recap,  our  simulation  model  is  the  overall  Concert  model 
comprised  of  a  Multibus  mode!  (one  for  each  Multibus)  and  a  Kingbus  model.  We  assume  the 
Multibus  model  with  long  word  and  Kingbus  accesses,  discussed  in  section  2.9,  for  the  Multibus. 
The  Kingbus  model  depends  on  the  arbitration  algorithm  and  the  Kingbus  access  paths.  Once 
again,  our  standing  assumptions  arc: 

each  Multibus  model  has  exactly  the  same  parameters  so  ill  Multibus  models  are  identical  in 
every  respect 

all  the  random  variables  /;).  lr.  l(„\ui-  mid  lmm  and  all  the  probabilities  fi,  <p,  and  arc 
mutually  independent  and  stationary 

the  Kingbus  model  is  completely  symmetric  with  respect  to  each  Multibus  interconnection. 

The  simulations  include  the  Mullibus-Kingbus  interaction.  In  particular,  the  simulation  model 
faithfully  incorporates  the  fact  that  a  request  from  a  Multibus  cannot  be  latched  by  the  Kingbus 
arbiter  until  the  grant  from  the  previous  request  from  that  Multibus  has  terminated,  as  is  the  case 
in  the  actual  Concert  system. 

In  these  simulations  we  assume  in  addition  to  the  previous  assumptions  that: 

1)  the  processing  time  is  exponentially  distributed 

2)  there  are  no  long  word  accesses  i.c.  fi  -0  (hence  the  recovery  time  distribution  is  irrelevant) 

3)  there  arc  only  Kingbus  accesses  i.e.  ^--1  (hence  the  Multibus  access  time  distribution  is 
irrelevant) 

4)  the  start  up  time  is  zero  i.c.  tUlir,  0 

5)  die  Kingbus  data  transfer  lime  llr<ms  is  deterministic  and  hence  the  duration  if  for  which  seg¬ 
ments  arc  allocated  to  a  Kingbus  request  is  constant  (</  c  1 )  (Wo  have  already 

<• 

assumed  in  section  4.2  that  is  a  deterministic  integral  multiple  of  c.) 

The  restrictions  of  (i  0  and  if  1  may  seem  restrictive,  but  wc  make  them  because  of  space  and 
time  constraints.  To  some  degree,  the  effect  of  4  can  be  determined  by  varying  rf)  with  4  held 
constant  at  1.0.  We  make  the  assumption  of  ip  I  in  particular  because  wc  are  chiefly  interested  in 
the  performance  of  the  Kingbus. 

t  f  is  Ihc  Knti’biis  iiibilcr  clock  period. 
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The  parameters  in  the  simulations  arc  as  follows: 

1)  The  number  of  processors  on  a  Multibus,  .V.  We  take  /V  I,  2.  and  4. 

2)  The  mean  processing  time.  //;.  We  lake  tp  -5.0c,  10.0c,  20.0c,  50.0c,  and  sometimes  i 00.0c 
and  200.0c . 

3)  The  Kingbus  destination  probabilities,  /»**.  We  consider  three  different  sets  of  Ringbus  des¬ 
tination  probabilities:  asymmetrical,  symmetrical,  and  uniform,  as  listed  in  Table  4.5. 


Distribution 

pf* 

pf‘ 

pf“  " 

l>  4 

PR,\ 

PK,{  " 

PK,\ 

Asymmetrical 

.4324 

.2162 

.1081 

.054  r 

.0270 

.0541 

.1081 

Symmetrical 

.2759 

.1379 

.0690 

.0345 

.0690 

.1379 

.2759 

11  ni  form 

.1429 

.1429 

.1429 

.1429 

.1429 

.1429 

.1429 

Table  4.5:  Kingbus  destination  probabilities 


both  the  asymmetrical  and  symmetrical  Kingbus  destination  probability  distributions  arc 
negative  binary  exponential  distributions  where  the  exponent  is  the  smallest  number  of  seg¬ 
ments  required  to  connect  the  source  and  destination.  That  is,  for  both  the  asymmetrical 
and  symmetrical  distributions,  pf111  - C2~'rg(,i  where  .vcg ( / )  is  the  smallest  number  of  seg¬ 
ments  required  to  connect  the  source  slice  to  the  destination  slice  i  slices  away  from  the 
source  slice  and  C  is  a  normalizing  constant.  (Recall  that  the  sign  of  /  denotes  the  direction 
around  the  Ringbus).  l  or  the  asymmetrical  disttibulion,  icg(/)  is  computed  assuming  asym¬ 
metrical  Ringbus  access  paths  and  for  the  symmetrical  distribution,  scud)  is  computed 
assuming  symmetrical  Ringbus  access  paths.  Tor  example,  the  minimum  number  of  segments 
required  to  connect  a  slice  to  its  neighbouring  slice  in  the  clockwise  direction  is  one  for  both 
the  asymmetrical  and  symmetrical  access  paths.  Thus  ' 1  and 

pKiHsym)  2~  Qn  the  other  hand,  the  minimum  number  of  segments  required  to 

connect  a  slice  to  its  neighbouring  slice  in  the  counterclockwise  direction  is  thtec  for  asym¬ 
metrical  access  paths  and  one  for  symmetrical  access  paths.  Thus  p  >  (  "',m2  '  and 

KIHs\m)  /  ,'imv- 1 
r  ■- 1  '  L 

The  asymmetrical  and  symmetrical  access  paths  arc  intended  to  rcllccl  the  different  distribu¬ 
tion  of  accesses  that  would  be  plausible  with  the  respective  asymmetrical  and  symmetrical 
access  paths  if  the  accesses  exhibited  locality.  The  uniform  distribution  is  intended  to  reflect 
the  distribution  of  accesses  if  the  accesses  exhibited  no  particular  locality. 

4)  The  Ringbus  arbiter  algorithm.  We  consider  five  different  arbiter  algorithms: 


i)  The  rotating  priority  ( with  counter-clockwise  priority  rotation)  algorithm  discussed  in 
Chapter  3  and  section  1.2.3.  I  bis  is  the  algorithm  employed  in  the  actual  Concert  sys- 
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ii)  The  greedy  algorithm.  This  algorithm  pursues  a  maximum  reward  strategy  -  in  every 
arbiter  clock  cycle  it  grants  the  maximum  number  of  requests  that  it  can.  l  ies  between 
request  sets  with  the  same  reward  arc  broken  in  favour  of  llic  request  set  with  die 
greatest  number  of  the  largest  requests.  Any  tics  remaining  after  dns  point  are  broken 
arbitrarily. 

iii)  The  two  phase  greedy  interval  algorithm.  This  algoridim  is  best  described  by  first  con¬ 
sidering  a  single  phase  greedy  interval  algoridim.  Such  an  algorithm  alternates 
between  an  idle  interval  and  a  grant  interval.  No  nonnull  requests  arc  granted  during 
die  idle  interval.  The  idle  interval  terminates  when  the  first  nonnull  request  arrives  at 
die  Kingbus  arbiter,  if  there  currently  arc  no  pending  nonnull  requests  latched  by  the 
arbiter,  or  it  terminates  tari  t  c  after  the  end  of  die  previous  grant  interval,  if  there  is 
at  least  one  nonnull  Kingbus  request  ungranted  from  the  previous  grant  interval.  This 
minimum  idle  interval  of  tafb  -he  corresponds  to  the  minimum  lime  between  the  termi¬ 
nation  of  a  Kingbus  grant  and  the  initiation  of  the  next  Kingbus  grant  from  the  same 
slice. 

As  die  name  implies,  nonnull  requests  are  granted  only  during  the  grant  interval  which 
extends  from  the  termination  of  the  idle  interval  until  the  Kingbus  access  correspond¬ 
ing  to  each  granted  request  has  completed.  The  actual  arbitration  -  i.c.  deciding  which 
request  set  to  grant  -  is  done  only  at  the  beginning  of  a  grant  interval.  The  same 
greedy  algorithm  discussed  in  4(ii)  performs  the  arbitration  ar  this  point.  All  grants 
remain  in  effect  unchanged  until  dicir  respective  Kingbus  accesses  terminate. 

Thus  the  duration  of  a  grant  interval  is  determined  by  die  longest  access  time  of  diosc 
requests  granted.  This  could  be  a  problem  if  there  was  a  high  variability  in  die  Kingbus 
data  transfer  time,  l,mns .  However,  we  assume  that  t,nm  is  deterministic  (see  6)).  This, 
of  course,  ignores  read-modify-writc  accesses,  for  which  /WWK  would  be  much  greater 
than  for  reads  or  writes.  The  arbiter  algoridim  can  be  modified  to  deal  with  such 
accesses.  One  such  way  is  to  terminate  a  grant  interval  when  all  non-read-modi fy- write 
accesses  terminate  and  allow  grants  corresponding  to  read-modify-writc  accesses  to 
carry  tin  into  the  next  grant  interval. 

Now  a  two  phase  greedy  interval  algorithm  consists  of  one  single  phase  greedy  interval 
algorithm,  which  we  call  die  primary  phase,  and  a  second  single  phase  greedy  interval 
algorithm,  delayed  by  inrb  t-e  with  respect  to  the  primary  phase.  We  call  this  second 
phase  the  secondary  phase. 

The  single  phase  greedy  interval  algorithm  was  motivated  by  the  finding  in  section  3.4 
that  in  heavy  traffic  llic  optimal  arbiter  algorithm  for  four  slices  and  deterministic  grant 
durations  of  </  rounds  tends  to  align  the  requests  so  that  tiny  arc  granted  at  intervals 
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of  1/  rounds.  Tims  we  expected  (he  single  phase  greedy  interval  algorithm  to  yield  good 
performance  in  heavy  tial'lic.  We  found,  however,  that  it  actually  yielded  performance 
that  was  usually  worse  than  the  rotating  priority  algorithm  (with  symmetrical  access 
paths).  Presumably,  this  was  due  to  the  idle  interval  of  duration  l(iri,  i-c  during  which 
no  request  arc  granted.  We  added  the  secondary  phase  in  an  attempt  to  improve  the 
utilization  of  the  Ringbus  segments  and  hence  improve  llic  throughput. 

iv)  The  crossbar  algorithm.  With  this  algorithm  the  Ringbus  is  transformed  into  a  crossbar 
interconnection. 

v)  The  commonbus  algorithm.  With  this  algorithm  the  Ringbus  is  transformed  into  a  sin¬ 
gle  time-shared  common  bus. 

5)  The  Ringbus  access  paths,  l  or  the  rotating  priority  arbiter  algorithm,  we  consider  both  asym¬ 
metrical  and  symmetrical  access  paths,  l  or  die  greedy  and  the  greedy  interval  algorithms  we 
consider  only  symmetrical  access  paths.  The  issue  of  asymmetrical  or  symmetrical  access 
paths  is  irrelevant  for  the  crossbar  and  commonbus  algorithms. 

6)  The  Ringbus  data  transfer  time,  In  all  eases  we  take  tmws  7c,  as  a  rough  approxima¬ 
tion  of  the  case  in  die  actual  Conceit  system  (when  c  200nsec  -  sec  section  3.3.2  of  Appen¬ 
dix  A). 

(Note:  there  is  no  point  to  Liking  tlnms  (>.8 c  here  as  we  would  have  done  in  section  4.2. 
The  reason  is  that  no  new  requests  can  be  latched  by  the  Ringbus  arbiter  until  >/„,/,  after 
the  Ringbus  access  -  i.c.  data  transfer  -  has  terminated  [since  die  grant  corresponding  to  this 
access  continues  for  /„,./>  past  the  termination  of  the  acccss|.  Since  /,„&><•  here  [see  below), 
the  minimum  interval  between  the  termination  of  an  access  and  the  latching  of  the  next 
request  from  die  same  Multibus  is  always  >c.  Tor  i,ranx  ~ 7c  this  interval  is  and  for 
1  irons  -  b.98e  this  interval  is  iart)  /-.02c.  flic  difference  between  the  probability  of  a  request 
arriving  from  a  Multibus  in  an  interval  of  lar>,  and  an  interval  of  y  .02c  is  negligible.) 

7)  flic  Ringbus  arbitration  delay.  iliri,.  We  take  inrt,  -2c  as  in  the  actual  Concert  system  for  the 
rotating  priority,  greedy,  and  greedy  interval  algorithms.  (Hence  </  9c  for  these  three  algo¬ 
rithms.)  Tor  die  crossbar  and  commonbus  algorithms,  we  take  -  c  to  reflect  the  greater 
simplicity  inherent  in  the  arbiter  algorithm  in  these  eases.  (Hence  d  8c  for  these  two  algo¬ 
rithms.) 

8)  Hie  block  size  li  and  the  run  si/e  R .  In  all  eases  we  took  It  --- 100  and  R  ~  10. 

The  following  tables  contain  the  simulation  results,  flic  statistics  reported  are  the  mean  pro¬ 
cessor  cycle  time.  (the  reciprocal  of  the  throughput  of  die  processor),  the  Multibus  waiting 
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time  per  access,  fM..  the  Ringbus  wailing  time  per  Kinghus  access,  fvW/#,  and  the  mean  number  of 
Ringbus  grants  in  progress  per  arbiter  clock  period,  g.  A  grant  is  considered  in  progress  for  die 
total  lime  that  at  least  one  Ringbus  segment  is  allocated  to  the  grant.  Since  segments  remain  allo¬ 
cated  to  a  grant  for  a  period  la,b  after  the  termination  of  the  Ringbus  data  transfer  lime,  a  grant  is 
in  progress  for  a  total  time  of  imws  /  lnrb,  which  equals  9c  for  the  rotating  priority,  greedy,  and 

greedy  interval  algorithms  and  8c  for  the  crossbar  and  common  bus  interconnections.  The  ±  fig¬ 
ures  associated  with  each  statistic  indicate  the  corresponding  95%  confidence  intervals. 


Destination 

I’robs:  ((symmetrical 

,V  - 1 

Arbiter  Alg 

itrithm 

Rotating 

Rotating 

Greedy 

Interval 

Cross- 

Common 

Access  l\tths 

Asym. 

Sym. 

Sym. 

Sym. 

bar 

li'.IS 

U\vli/( 

29.7  ±2.1 

28.31  t.4 

I3.75±.69 

24  81  ±56 

16.131.37 

64.051.03 

iw/c 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

lp  -5.0c 

wun/c 

I4.5±2.l 

13.1  ±1.1 

8.59±.80 

9.64  ±.48 

7.27±.  19 

50.021.33 

a 

2.421.17 

2.55±.I3 

3.03  ±.09 

2.90±  .07 

3.98  ±.09 

.99921  0005 

30.7  ±1.6 

29.6  ±1.1 

26.21 +  .88 

28.I5±.55 

20.531.94 

64.09±.ll 

L/C 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

ip  ---  lO.Ot 

wKH//<‘ 

!0.8±  1.7 

9.57  ±1.19 

6.32±.76 

8.28  ±  .27 

1.671.30 

45.101.57 

£ 

2.35±.  1 2 

2. 43  ±.09 

2.75±.09 

2.56  ±.05 

3.131.14 

.998  ±002 

K-wlc/ 1 

36.5  A  1.9 

35.1  ±1.2 

33.9±  1.6 

36.42+. 78 

29.911.3 

64.171.16 

_ 

rH/c 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

lp  -  -  20.0c 

•'"’A*  // /c 

6. 18  ±.89 

5.33±.75 

3.98  ±.76 

6.81  ±.4-1 

1.001.18 

34.831.58 

ft 

1 .97  ±.11 

2.05±  .07 

2.12±.IO 

1.98 +.04 

2.151.09 

.9971.002 

lavlc'''1 

61. !  ±3.3 

62.314.8 

6l.4±4.l 

64.59±  3.4 

58.612.7 

70.912.8 

iw/c 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

fp  -  jUfUi 

wrh/c 

2. 19  ±.49 

1.86  ±.22 

1.5  hi. 35 

4.40±,34 

.441.10 

11.611.7 

£ 

l .  1 8  ±  .06 

I.I6±.09 

1 . 1 7  ± .08 

1 . 1 2±  .06 

1. 091.05 1 

.901.0*1 

Table  4.6(a):  Ringbus  simulation  results 


I"  Remember,  hete  g  represents  the  average  number  of  grants  in  progress  per  round.  b\  which  we  menu  the 
average  number  of  grants  per  round  to  which  one  or  mere  segments  are  allocated,  not  the  average  number  of 
grants  per  tumid  uiili/mg  segments  (llcic  we  cou-odct  a  riant  to  be  to  pt ogress  for  the  total  time  that  at  least 
one  Ringbus  segment  is  allocated  lo  the  giant  Since  •*  'utcnls  iein.iui  allocated  to  a  giant  lor  a  period  hub 
after  the  termination  of  the  Knighiis  data  li.titsfci  ':mo.  a  nanl  is  in  pi  ogress  for  a  total  line  of  hums  *  hirb  ) 


Access  Paths 


/c, 

lw/c 

K _ 

iw/c 

wkh^c 

K 

liyrfl’/i 

lnA- 

wKH/c 

_  a 

V>r/c//< 

lw/c 

wKII/l 

R 


Asym.  Sym.  j  Sym. 
38.5±  1 .4  29.6  ±  1.4  2.1.80±.40 


0.0 

23.2±  1.6 


0.0 
I4.5±  1.5 


1.87  ±.07  2.43  ±.11 


0.0 

8.67  ±.33 
3.03±.05 


39.14±.74  30.6±  1.2  26.75±.43 
0.0  0.0  0.0 
19.3±1.1  10.55  ±.95  6.90±.66 

l.84±.03  2.36  ±.  10  2.09±.04 


41.9±2.5  36. 35 ±.70  34.4  ±2.2 
0.0  0.0  0.0 
12.4±2.1  6.38  ±.79  4.20±.72 

1.72±.10  1.98  ±.04  2.IO±.13 


63.7±4.7  62.9±3.9  62.3±3.1 
0.0  0.0  0.0 
4.04±.99  2.19±.47  1.59±.24 

1 . 1 3  ±  .08  1.I5±.07  1.16±.06 


25.38±.64 

0.0 

10.22±.58 
2.84  ±.07 

28.76±.58 

0.0 

8.73±.50 

2.50±.05 

37.2±1.3 

00 

7.05  ±.48 
1.94  ±.07 

65.0±3.2 
0.0 
4.58±.41 
1 . 1 1  ±.05 


Cross¬ 

bar 

16.35±.10 

0.0 

2.54±.26 

3.92±.10 

20.63±.64 

0.0 

1.81  ±.26 
3.1 1  ± .  1 0 

30.17±.7I 
0.0 
1 .07  ±  .23 
2. 1 2±  .05 


64.05±.03 

0.0 

50.02±.33 

,9992±.0005 

64.09±.l  1 
0.0 

45. 1 0±  .57 
.998  ±.002 

64.17±.16 

0.0 

34.83±.58 
.997  ±.002 


59.8  ±3. 5  70.9±2.8 
0.0  0.0 
46±.ll  1 1.6±  1.7 
1.07  ±.06  ,90±.04 


Table  4.6(b):  Uingbus  simulation  results 
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Destination  Prohs:  unifoiTn  N  -  1 


Arbiter  Algorithm 

Rotating 

Rotating 

Greedy 

Interval 

Cross- 

Common 

Access  Paths 

Asyin. 

Sym. 

Sym. 

Sym. 

bar 

Bus 

leveled 

47.9±1.9 

42.9±2.l 

29.18±  86 

29.26±.53 

lo.69±.39 

64.05±.03 

tp  -5.0c 

]w/c 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

wK,i/c 

32.5±2.1 

27.6±2.1 

13.95±.78 

14.01  ±.54 

2.89±.35 

50.02±  .33 

£ 

1.50±.06 

1.68  ±.08 

2.47  ±.07 

2.46±.04 

3.84±.09 

.9992±  .0005 

Uw  /«/c 

48 .3  ±  1.9 

42.7±2.1 

3 1 . 19±  .7 1 

3 1.81  ±.49 

20.77±.62 

64.09±.l  1 

7p  =  \0.0c 

lw/c 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

wRn/c 

28.2±2.2 

22.7±2.1 

1 1.31  ±.64 

1 1.89±.70 

l.92±.31 

45. 1 0±  .57 

£ 

1.49±.06 

1.69  ±.08 

2.31  ±.05 

2.26±.04 

3.09±.09 

.998  ±.002 

K'rctf^ 

49.0  ±  1.4 

43.8±2.5 

37.4±  1.6 

39. 3  ±1.1 

29.5  ±1.1 

64.I7±.16 

lp  -20.0c 

tK/c 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

n'KH/c 

19.2±  2.2 

1 3.8±  3. 1 

7.31  ±.63 

9.23±.36 

1.25±.28 

34.83±.58 

£ 

1.47  ±.04 

1.64±.09 

1. 9  2  ±.08 

1.83  ±.05 

2. 17  ±.08 

,997±.(K)2 

65.9±2.6 

65.4±5.4 

62.6±3.4 

65.7  ±3.8 

59.9±3.3 

70. 9  ±2.8 

V 

o 

o 

KT) 

!i 

iw/c 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

wKR/c 

6. 46  ±.90 

4.55±  1.22 

2.96±.64 

5.29±.47 

.51  ±.18 

1 1.6±1.7 

£ 

1.09  ±.04 

1 . 1 0±  .09 

I.I5±.<>6 

1 . 10±.0f* 

1.07  ±.06 

.90  ±.04 
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Destination  IYol>s:  asymmetrical 

/V-  2 

Arbiter  Algorithm 

Rotating 

Rotating  Greedy 

Interval 

O  oss- 

Common 

Access  I'aths 

Asym. 

Sym. 

Syin. 

Sym. 

bar 

Bus 

rfir/(./c 

60.2±2.2 

55.8±2.7 

44.1  ±1.4 

47.411.0 

26.79±.45 

127.801.17 

i/c 

25. 1  ±  1 .2 

22.9±  1.4 

17.09±.76 

18.70±.58 

S.55±.34 

58.71  ±.30 

lp  -5.0c 

18.1  ±  1.1 

1 5.8  ±1.4 

9.%±.69 

1 1.66±.52 

3.23±  .20 

53.84±.02 

&’ 

2.391.09 

2.58  ±.13 

3.26±.10 

3.03  ±.07 

4.77±.08 

.9998±.0002 

Kwh^1 

59.312.8 

55.9±2.1 

44.79±.97 

47.46±.48 

28.29±.47 

I27.77±.I8 

7  inn  . 

20.1  ±1.5 

1 8.42±  .87 

13.21  ±.48 

14. 1 7±  .38 

5.58  ±.20 

53.7 1±.  51 

fp  --  i(/.Ut 

«««/<' 

1 7.  J±  1.4 

1 5.5±  1.0 

9.80±.52 

1 1.24±.25 

2.90±.18 

53.80±  .04 

2.42±.l  1 

2.57  ±.09 

3.21  ±.07 

3.03  ±.03 

4.52±.08 

.99961.0003 

Vlt'/t'^ 

58.912.0 

56.0±2.2 

47.5±  1.0 

50.18±.82 

34.3±  1. 1 

127.781.30 

12.5±  1.0 

1 1.2±  1.0 

7.76±.69 

8.70±.34 

2.93±.20 

44.041.57 

20.0c 

ICW/(/t 

14.8±l.l 

I3.2±  1.2 

8.321.36 

10.08±.30 

2. 19  ±.23 

53.401.13 

K 

2.44  ±.08 

2.57±.  10 

3.()3±.07 

2.86±.05 

3.72±.  1 2 

.99941.0006 

Irwlr/t‘ 

71.4±2.9 

70.2±2.2 

6C>.5±2.2 

70.8±1.7 

6 1 .7  ±  1 .9 

128.051.34 

3.10±.51 

2.89±.59 

2.12±.43 

3.01  ±.42 

,93±.I5 

22.111.4 

5d.0c 

6.86  ±  1.0 

6.04  ±1.08 

4.12±.44 

7. 26  ±.26 

,99±.I7 

45.7811.04 

2.01  ±.08 

2.05  ±.07 

2.16±.07 

2.03  ±.05 

2.07  ±.06 

.9981.002 

'nrl/c 

1 14.9±5.3 

1 14.6±6.2 

1 12.0±5.6 

1 17.4±5.3 

1 10.513.3 

133.3-11.8 

.841.17 

.78±.17 

.75±.12 

1.141.24 

.43  ±.08 

4.1611.3 

l„  -  100.0c  11 

7  iv/r/r/c 

2.74:1-46 

2. 25±.37 

1.82±.?8 

4.88  ±.39 

.49  ±.08 

19.412.8 

K 

1.25  ±.06 

1.251.07 

1.28T.06 

1.221.05 

1.1 (>±.04 

.9581.013 

Table  4.6(d):  Ringbus  simulation  results 
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Destination  Probs:  syinmeirical 

/V  -  2 

Arbilei  Algorithm 

Rotating 

Rotating 

Greedy 

Interval 

Cross- 

Common 

Access  Paths 

Asvm. 

V  . 

Sym. 

Sym. 

Sym. 

bar 

Hus 

77.3±3.9 

56.6±  1.7 

45.26±.67 

47.92±.90 

— 1 

127.80±.I7 

-  .  /u./r 

It. 6  ±1.8 

23.3  ±.84 

1 7.72±  .43 

I8.97±.61 

8.82  ±  .27 

58.71  ±.30 

Ip  -  5.0c 

'  wKII//(' 

26.6±  1 .9 

I6.26±.83 

10.56±.32 

11. 93  ±.46 

3.51  ±.25 

53.84±  .02 

K 

!.86±.0t> 

2.54±.07 

3. 18  ±.05 

3.00±.0() 

4.67  ±.08 

.9998  ±.0002 

lc\ch/'i 

76.4±3.5 

56.9  ±1.1 

45.27±.92 

48.48±.75 

28.82±  .54 

1 27.77  ±.18 

28.3±  1.7 

18.97±.68 

1 3.32±  .60 

14.91  ±.51 

5.79±  .38 

53.71  ±.51 

.  „  . 

V  10.0c 

tvA.w/c 

26.0±  1 .8 

16.04  ±.71 

10.06±.42 

1 1 ,79±  .35 

3.21  ±.33 

53.80±.04 

fi 

1.88  ±.09 

2.53±  .05 

3. 1 8±  .06 

2.96±.04 

4.43±.09 

.9996±.0003 

78.0±3.2 

58.7±2.3 

48.30±.54 

5 1 .5 1  ±  .57 

34.52±.74 

127.78  ±.30 

/„,/{■ 

21.1  ±1.6 

12.4±  1.4 

8.02±.41 

9.2  3  ±.56 

2.97  ±.33 

44.04±.59 

//>  -2l).(l( 

IVfl/y/r 

25.4±  1.7 

14. 8±  1.4 

8.84  ±.19 

1 0.74±  .40 

2.31  ±.23 

53.40±.I3 

a; 

1.84±.()7 

2.45±.I0 

2.98  ±.03 

.. 

2.79±.03 

3.7()±.()8 

.9994±.0006 

K'wU'S < 

82.3±1.8 

71.2±1.3 

67.2  ±1.7 

7 1 .7  ±  1 .3 

61 .4  ±  1 .8 

128.05  ±.34 

/(• 

6. 19±.82 

3.11  ±.52 

2.29±.36 

3.I8±.19 

.93  ±.08 

22.1  ±1.4 

6,  --  50.0c 

’  *i<n /<• 

I4.6±  1.0 

6.89±.74 

4.52±.51 

7.76±,44 

l.()3±.l  1 

45.78±  1.04 

8 

1.75±.04 

2.02±.04 

2.14±.05 

2.00±.04 

2. 08  ±.06 

.998 +  .002 

tnvlt-/c 

1 17.0±4.9 

1 1 4.2±  2.6 

1 1 2.6±  3.6 

1 16.4±4.2 

1 10.8±5.8 

133.3.±  1.8  • 

1.17  ±  .26 

.81  ±.12 

.72± .  1 3 

1 . 1 7± .  1 8 

.42  ±.07 

4. 16±  1.3 

lv  -  lOO.Oe  _  M'  y 

^  VV^/j/C 

4.70+ .68 

2.76±.4S 

5.9l±.0H 

5.C8±.33 

.52+.. 10 

19.4±2.8 

P 

o 

1 .23  ±.05 

1.26±.03 

1.28±.04 

l.23±.04 

1 . 1 5±  .06 

.958  ±01 3 

Table  4.6(e):  Kingbus  simulation  results 


256 


Integration  and  Simulation 


I  k'stination  I 'robs:  uniform 


Arbiter  Algorithm 
Access  I'iilhs 


t],  100.0c 


wK„/c 

K 

Cw/<'//* 

!*/<' 

g 

/jVc/c/V 

/»■/<' 

>»/(«/< 

/»/<• 

\vrh/c 

s 

(firlr/i 

i>  y/C 


Rotating 

Asym. 

9(i.5  ±2.8 
43.2±  1 .4 
36.2  ±1.4 
1 .49  ±.04 


Rotating 

Sym. 

85. 2  ±2. 4 
37.5±1.3 
30.6±  1.2 
;  1.69  ±.05 


(ircedv  Interval 


97.2±3.7 
38 .6  ±  1 .9 
36.5  ±1.9 
1 .48  ±.06 

97.5±2.1 

29.8  ±1.4 
35.6±  1 .3 
1.47±.03 

97.7±2.5 
1 1.4±  1.7 
25.2±2.4 
1.47  ±.04 

121 .3±  5.3 
1.83±.49 
8.0±  1.5 
1  19±  05 


84.8  ±3.3 
32.4±2.1 
30.2  ±1.8 

1.70±.07 

84.6±3.4 

24. 1  ±1.5 

28. 9  ±1.9 
1.7()±.07 

86.9±2.5 
7.92±.85 
I7.9±2.0 
1.65  ±.05 

11 8. 1  ±4.6 
1.38±.?4 
5.7 1  ±.6 1 
1.22±.05 


56.1  ±1.2 
23.03±  .78 
15.99±  .61 
2.56±.05 

56.48±.83 
1 8.79±  .60 
1 5.76±  .47 
2.54±.04 

57.82±.85 
12.37  ±.70 
14. 11  ±.46 
2.49±.04 

73.0±  1 .3 
3.68  ±.54 
8.15±.57 
1.97  ±.03 

1 14.2±3.3 
l.(K)±.2S 
3.57±.23 
l.26±.04 


57. 1  ±1.0 
23.48±.54 
16.46±  .47 
2.52±  .05 

57.47±  .85 
19.I2±.92 
16.32±  .44 
2.50±.03 

58.6 1  ±.63 
1 2.53±  .68 

14.61  ±.44 
2.45±.l)3 

74.8±1.2 
4. 13  ±.47 
9.96  ±.22 
1.92  ±.03 

1 18.7±5.2 
1.45±.30 
h.25±.38 
1.21  ±.06 


Cross¬ 

bar 

28.0!  ±.69 
9. 1 8±  .34 
3.84±.32 
4.57±.l  1 

29.35±  .70 
6. 13  ±.42 
3. 54  ±.30 
4.36±.10 

35.1  ±1.1 
3. 14±  .35 
2.58  ±.30 
3.64  ±.  II 

60.7±2.1 
1.01  ±.14 
1 . 1 9  ± .  1 4 
2.1 1  ±.07 

109.7  ±5. 5 
.42  ±.08 
.  59  ±.1.3 
l .  1 7  ±  .06 


Common 

Hus 

I27.80±.17 

58.71  ±.30 
53.84±.02 
,9998±.0002 

127.77±,18 

53.71  ±.51 
53.80±.04 
.9996±  .0003 

1 27.78  ±  .30 
44.04±.59 
53.40±.I3 
.9994 ±  .0006 

1 28.05±  .34 
22.1  ±1.4 
45.78±  1.04 
.998  ±.002 

1 3  3.3  ±.  1.8 
4.16±  1.3 
19.4  ±2.8 
.958  ±.013 


l  able  4.6(0:  Ringbus  simulation  results 


integration  and  Simulation 


Destination  Probs:  asymmetrical 


Arbiter  Algorithm 
Access  Paths 


ip  1 00.0c 


ip  200.0c 


1W/C 

_  K 

Kyclc^i 

lw/c 

wK„/c 

K 

Vic/c^* 

tn./c 

IVA>W/( 

n 

I  cycle''' l 
/«  /(' 

WRn/c 

Uyclc'''* 


hK„/c 


Rotating 

Asym. 

I  lH.4-t.Dl 
83.5±2.3 
17.65±.77 
2.43  ±  .06 

1 17.9±4.1 
78.1  ±3. 1 
17.6±  1.0 
2.43  ±.08 

1 18.2±4.6 
68.4±3.5 
1 7.6±  1 .2 
2.43±.I0 

1 18.7±3.5 
39.9  ±2.2 
16.55±.82 
2. 42  ±.07 

1 3 1 .7  ±2.6 
!0.3±2.0 
9. 39  ±1.33 
2. 1 8 1.05 


leveled  I  214.5±7.6 


wrk/c 


1.70±.21 
2.99  ±.42 
1.34  ±.05 


Rotating 

Sym. 

112.1  ±2.3 
78.7  ±1.8 
16.09±.61 

2.56±.05 

1 1 1.2±3.9 

72.9  ±3.0 

15.9  ±1.0 
2.58  ±.09 

1 1 1.1  ±2.9 
02.8  ±2.3 
I5.78±.7i 

2.58±.07 

1 1 2.2±  3. 1 
35.3±2.9 
14.73±.98 
2.56  ±.07 

127.9±2.3 
8.51  -t  1.2 
7.77±.99 
2.24  +  .04 

214.8±::.2 
1.52+. 27 
2.60±.45 
1.34  ±.05 


Circedv 


88.4  ±1.4 
6 1 . 1  ±  1 . 1 

1 0. 1 4±  .35 
3.25  ±.05 

88.0±  1 .6 
55.7±  1.3 
10.04±.39 
3.26±.06 

88.5  ±  1.7 
46.1  ±1.5 
10.  II  ±.43 

3. 24  ±.06 

92. 3 3 ±.68 
2I.7+.I.I 
8.90±  .3 1 
3.11  ±.02 

122.1+2.9 
5.55±  .77 
4. 94±.37 
2.35±.06 

214.2±8  :> 

1 . 3  7  ± .  1 9 
i.95±.22 
1.34  ±.05 


Interval 

Sym. 

94.1  ±1.0 
65.24±.80 
1 1 .56±  .25 
3.05  ±.03 

93.7  ±1.5 
60.0±  1 . 1 
1 1 .49±  .37 
3.06  ±.05 


Cross - 


Common 


53.34±.77  255. II ±.30 
34.82±.56  1 85.58±  .25 
3.36± .  19  53.93±.01 
4.78±.07  .9999 ±.0001 

53.21  ±.65  254.99±.42 
29.73 ±.60  1 80.45 ± .53 
3.32±.  16  53.92±.OI 
4.79±,06  ,9998±.0002 


94.0±  1 .2  53.78± .86  254.95±.45 

50. 1 3  ±.86  20.28±  .88  170.27±.69 
1 1 ,52±  .29  3.24  ±  .20  53.90±.02 

3.05±.04  4.74±.()7  .9998±  .0002 

98. 0±  1 .4  67 .9  ±  1.2  255.03±.22 

25.00±.85  6.33  ±  .60  140.3±2.2 

1 0 . 7  7  ± .  3  2  2.26±.IS  53.83±.05 

2.93  ±.04  3.76±  .07  .9994  ±.0004 

127. 5  ±2.9  1 1 2.5  ±2.5  255.3±.71 

8. 06  ±.94  1.86  ±  26  9 1.6  ±3.6 

8.24 ± .25  I  09±.I5  52.56±,39 

2.25 +  .05  7.27 1*:  .05  .998  ±.001 

718.6±4.7  2 1 6. 9±  9.2  259.0±2.7 
2.32  ±.30  .72 +  .07  17.7±3.2 

5.29  ±.31  .51  rt  .04  30.1  ±3.0 

1.31 +  .03  1.21  ±  05  .985  ±  .011) 


Table  4.6(g):  Kingbus  simulation  results 
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Destination  I’robs:  symmetrical  A/  4 


Arbiter  Algorithm 

Rotating 

Rotating 

(3  reedy 

Interval 

Cross- 

Common 

Access  Piit Its 

Asym. 

Sym. 

Sym. 

Sytn. 

bar 

Bus 

154.1  ±2.9 

1 13.8±3.5 

89.8±1.2 

95.8  ±  1.1 

54.19±.58 

255. 11  ±.30 

-5.0c 

ln./c 

1 1 0. 1  ±  2.2 

80.1  ±2.6 

62.02±.84 

66.58±.84 

35.44±  .45 

185.58±.25 

h 

wKti/c 

26.63±  .7 1 

1 6.52±  .85 

I0.48±  .28 

1 2.00±  .28 

3.57  ±.14 

53.93±  .01 

x 

1.86±.03 

2.52±.07 

3.20±.04 

3.00±.04 

4.71  ±.05 

.9999 ±.0001 

lryil(-/c 

152.9±5.1 

1 14.3±3.5 

89.6  ±1.0 

95.9  ±1.3 

54.3±1.2 

254.99±.42 

10.0c 

lw/v 

104.1  ±3.7 

75.3±2.6 

56.9±.68 

61.6±  1.2 

30.52±.78 

1 80.45±  .53 

\ p 

M'A 

26.3±  1.3 

16.65±.84 

10.43±  .26 

I2.02±.31 

3.61  ±.29 

53.92±.01 

« 

1.88±.tK» 

2.51  ±.08 

3.20±.04 

2.99  ±.04 

4.69±.10 

.9998  ±.0002 

154.2±4.4 

1 14.9±2.6 

89.9±  1 .3 

96.24±.99 

54. 99 ±.64 

254.95  ±.45 

20.0c 

/„/<■ 

95.1  ±3.3 

65.7  ±2.1 

47.05±  .72 

51.66±.78 

2 1.06  ±.81 

170.27  ±.69 

'r 

ivA.„/c 

26.6±  1.1 

16.77±.64 

10.49±.33 

1 2.08±  .24 

3.54  ±.18 

53.90±.02 

X 

1.86  ±.05 

2.50±.05 

3. 1 9±  .05 

2.98  ±.03 

4.64  ±.(X) 

.9998  ±.0002 

/(.V(./(./c 

1 54.3±  3.4 

1 16.0±4.5 

94.0  ±1.2 

99.9±  1.3 

68.5  ±1.1 

255.03±  .22 

50.0c 

lK/c 

65.3±2.5 

37.7±3.5 

22.5±  1.1 

26.8±  1.3 

6.44±.54 

140.3±2.2 

'/> 

Wrh/c 

26.1  ±.91 

15.8±  1.3 

9.44±.27 

1 1 .44±  .28 

2.44  ±.16 

53.83±.05 

X 

1.86  ±.04 

2.47±.0t) 

3.05  ±.04 

2.8  7  ±.04 

3.73±.(X> 

,9994±  .0004 

leveled 

157.9±2.7 

I30.2±3.1 

123.3±2.9 

128.4.+.2.8 

1 12.6±2.5 

25S.3±.71 

-  100.0c 

tw/c 

25.8±3.4 

9.95  ±1.7 

6. 13±.72 

8.53±  .65 

1 .98  ±.18 

91.6±3.6 

'p 

WR„/c 

20.4  ±1.2 

9.30±  1 .2 

5.58±.53 

8.67±.  19 

1 . 1 9±  .09 

52.56±.39 

X 

1.8  2  ±.03 

2.20±.05 

2.33±.(X> 

2.23±.05 

2.27  ±  .05 

.98 9  ±.001 

219.2±5.5 

215.3±7.8 

215.7±9.0 

2 1 8.6±  7.3 

21 1.4±6.3 

259.0±2.7 

200.0c 

lH/c 

2.84±.70 

l.62±.30 

1.37  ±.24 

2.35±.27 

.71  ±.08 

1 7.7±  3.2 

<P 

*t<rSc 

5.95±.83 

3.00±.30 

2.11  ±.20 

5.56±.27 

.55  ±.07 

30.1  ±3.0 

X 

1.31  ±.03 

1.33  ±.05 

1.33±.06 

1.31  ±.04 

1 .2 1  ±  .04 

.985  ±.010 

Table  4.6(b):  Ringbus  simulation  results 


Arbiter  Algorithm 

Rotating 

Access  I’aths 

Asym. 

lCyeU./c 

lw/c 

tp  —  5.0c 

1  yvKi,/c 

& 

193.5±4.l 
139.5±3.0 
36.5  ±1.0 
1.48  ±.03 

tcyxic^1' 

/p-- io.oc  yc. 

1  WRH* 

193.7±3.3 

1 34.8±  2.5 
36.57±.82 
1.48  ±.02 

tp  20.0c  !wA 

8 

1 93.6±  2.3 

124.4±2.0 

36.48±.6l 

1.48  ±.02 

hyde^1 

ip  =  50.0c  yc 

'  WRtt/C 

i* 

193.5±4. 1 
94.3±4.4 

36.3  ±1.1 

1.48  ±.03 

Kyetf/c 

7n-  100.0c  y\ 

*R  !!'<■' 

193.0±4.9 

48.4±4.0 

32.2  ±1.6 
1.49±.04 

tp  -  2oo.oc  yc. 

p  *R :«/<• 

8 

228.0±5.l 

5. 1 5±  1 .5 
10.7±  1.7 

1.26±.03 

Rotating 


Sym. 

170.3  ±2.8 
122.3±2.2 
30.68±  .69 
1.68  ±.03 

170.0±4.7 
1 16.9±3.5 
30.6±  1 . 1 
l.69±.04 

I7l.4±3.1 
108.0±2.4 
31.0±.74 
1.67  ±.03 


172. 1  ±3.9 

78.7  ±3. 3 

30.8  ±  1.1 
1.67  ±.04 


171.2±6.4 
34.7  ±4. 4 
25.2±  1.9 
1.68  ±.06 


220.8±6.3 

3.43±.66 

7.00±.82 

1.30±.04 


1 1 1.9±  1.5 

1 14.20±.95 

78.5  ±1.1 

80.26±  .64 

|6.02±.39 

16.61  ±.21 

2.57  ±.04 

2.51  ±.02 

1 12.2±  1.1 
73.66±.94 
16. 1 0±  .24 
2.56±.02 

11 2. 1  ±1.5 
63.5  ±.8 1 
16.06±.35 
2.56±.03 

1 1 3.9±  1 .4 
36.6±  1.8 
i  5.22±  .48 
2.58±.03 


216. 1  ±5.8 
2.05±.39 
3.93±.47 
1.33±.04 


1 14.0±  1.2 
75.05±.91 
16.57±.28 
2.52±.02 

1 14. 1  ±  1 .0 
65.2±  1.2 
16.59±.23 
2.51  ±.02 

1 15.9±  1.7 
37.8±  1.8 
1 5.79±  .47 
2.47  ±.03 

1 35.8t  2.2 
12.67d_.99 
11.65±.?7 
2.1 1  ±.04 

2?  1.3  ±5.6 
2.94  ±.40 
6.82  ±.23 
1 ,30±  .03 


.2  55.7  ±.99 

.91  31. f>±1.0 

.28  3.94±  .25 

.02  4.58±.08 

.0  56.28±.80 

.2  22.0.+  I.0 

.23  3.89±.20 

.02  4.53±.06 

.7  69.3  ±1.6 

.8  6.62±.60 

.47  2.58±  .24 

.03  3. 68  ±. O') 

.2  1 13.2±2.9 

.99  2.03  ±.13 

.27  1.32±.IO 

.04  2.26  ±.06 

-  - 

.6  210.6±  1 1.3 
.40  ,73±.l  3 

.23  .()0±  .07 

.03  1.2 1  ±06 


Table  4.6(i):  Ringbus  simulation  results 


30 
58±  .25 
93 


1310  results  in  Tables  4.6(a)  through  (i)  indicate  little  variation  in  the  performance  with  dif¬ 
ferent  access  paths  and  arbiter  algorithms  for  light  loading,  as  one  would  expect,  and  large  varia¬ 
tion  in  the  performance  for  heavy  loading.  These  variations  in  performance  for  heavy  loading  are 

illustrated  in  the  following  table  of  the  throughput  with  lp  5.0c  relative  to  that  with  rotating 
priority  and  asymmetrical  access  paths. 
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is  by  their  saturation  throughput  i.c.  the  maximum  throughput  achievable.  This  is  a  particularly 
useful  and  convenient  way  to  characterize  the  performance  because  the  saturation  point  depends 
only  on  the  arbiter  algorithm  and  the  destination  probabilities.  Table  4.8  lists  the  saturation 

throughput  gv,/  (in  mean  number  of  grants  in  progress  per  arbiter  clock  period^)  with  the  various 
arbiter  algorithms  and  access  paths  considered  in  this  section. 


Algorithm 

Access  Path 

Destination  Probs. 

Asym. 

Sym. 

Uni. 

Commonbus 

n/a 

To” 

1.0 

i.o” 

Rotating 

Asyrn. 

2.4 

1.9 

1.5 

Rotating 

Sym. 

2.5 

2.5 

1.7 

Interval 

Sym. 

3.1 

3.0 

2.5 

Greedy 

Sym. 

3.3 

3.2 

2.6 

Crossbar 

n/a 

4.8 

4.7 

4.6 

Table  4.8:  Saturation  throughput  for  various 
algorithms  and  destination  probabilities 


Table  4.8  shows  clearly  the  relative  ordering  in  terms  of  throughput  in  saturation  oT  the  vari¬ 
ous  arbiter  algorithms  and  access  paths  considered.  Note  dial  Table  4.8  also  shows  clearly  that  the 
saturation  throughput  decreases  as  die  destination  probabilities  change  from  asymmetrical  to  sym¬ 
metrical  to  uniform. 

In  all  the  simulations  the  greedy  arbiter  algoriUim  yielded  better  performance  -  although  not 
by  much  -  than  die  two  phase  interval  algorithm.  This  was  a  slightly  surprising  result  considering 
that,  extrapolating  from  our  finding  with  four  slices  and  /;q  -0  in  section  3.4.  one  would  expect  an 
interval  algorithm  to  be  optimal  for  heavy  traffic.  On  closer  examination  this  result  is  not  so 
surprising.  Presumably,  die  result  is  a  consequence  of  the  nonzero  arbitration  time.  As  already 
mentioned,  the  single  phase  interval  algorithm  yielded  poor  performance  due  to  the  idle  interval 
during  which  no  requests  arc  granted.  The  two  phase  interval  algorithm  is  a  simple  attempt  to  util¬ 
ize  the  Ringbus  during  die  idle  period,  but  it  lias  the  consequence  of  causing  additional  request 
conflicts  because  one  phase  follows  the  other  by  less  than  the  duration  of  the  grants.  Ideally  one 
would  like  the  phases  to  be  nonovcrlapping  but  this  has  the  drawback  of  imposing  a  minimum 
wait  of  one  phase  (the  duration  of  a  grant)  until  the  next  request  can  be  granted  at  a  slice  aflcr 
the  previous  grant  at  that  slice  terminates.  Thus  there  seems  to  be  no  way  to  avoid  some  sort  of 
performance  penalty  due  to  the  nonzero  arbitration  time  when  implementing  an  interval-type 

t  As  before,  a  giant  is  considered  lo  be  in  progress  for  the  total  time  that  at  least  one  Kingbos  segment  is  allo¬ 
cated  to  the  gram 
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algorithm.  Ihis  suggests  that  it  is  important  to  include  the  effect  of  nonzero  arbitration  time  when 
trying  to  determine  an  optimum  arbiter  algorithm  for  die  actual  Concert  system. 

'ITie  interval  algorithm  suffers  from  another  disadvantage:  in  light  traffic  the  synchronization 
of  the  requests  with  the  phases  adds  to  die  total  waiting  time  of  a  request.  In  some  eases  with  light 
loading,  the  diroughput  with  die  two  phase  interval  algoridim  is  actually  less  dian  dial  of  the 
rotating  priority  algorithm  with  symmetrical  access  padis.  I  bis  suggests  die  obvious:  for  best  per¬ 
formance  the  arbiter  should  be  able  to  change  algorithms  to  adapt  to  changing  load  conditions. 

NS 

Ihc  overall  throughput  of  the  Concert  system  is  - - where  liyc/P  is  the  mean  cycle  time  of 

(cycle 

a  single  processor.  (Recall  that  7cycic~7p  y  (i7r  +  tWr  f  (1  //?)(( 1  -  'Pl'n.UH  +  'P  'an  n  )•)  As  a  function 
of  /p,  the  overall  throughput  is  maximum  at  7p  -0,  monotonically  decreases  as  tp  decreases,  and  is 
asymptotic  to  a  curve  in  the  family  .  Because  of  the  nonlinear  asymptote  of  the  overall 

h 

throughput  it  is  more  convenient  to  deal  with  the  mean  cycle  time,  for  which  an  equivalent  state¬ 
ment  is:  As  a  function  of  tp,  7c>ri(.  is  minimum  at  7p-  0.  monotonically  increases  as  7p  increases, 
and  is  asymptotic  to  7cyc/t.  =  7p  -f  f}7r  y  (1  f /?)((!  -  4)iaMii  +  ^7^’™ )).  This  leads  to  die  following 
simple  first  order  approximation  of  the  overall  throughput  as  a  function  of  7p: 

_  Oc  for  7p  tp7r  -Ml  tfiM-rtaMB  *  *$r)>  < 
lcyclc  ~  Ip  I  filr  y  (1  /  /?)(( 1  -  4>)7„mr  +  V<urT'))  Otherwise  (4-1) 

Cu*  's  ihc  value  of  trycic  when  ip~-  0  (and  all  other  parameters  fixed).  Hquation  4.1  is  a  con¬ 
venient  approximation  since  it  depends  only  on  one  parameter,  aside  from  the  fixed  input 
parameters.  Furthermore,  can  be  related  to  the  Ringbus  di  rough  put  when  7p  -  0  -  which  we 

denote  by  gr  °  -  as  follows. 

First,  when  ip  =0  a  request  from  a  processor  must  wail  for  die  requests  of  each  of  die  other 
N  - 1  processors  to  complete  before  it  must  proceed,  lienee 

=P'r  +  '  PM  -  +  H,Rlt) 

Second,  recall  that  I =  wrh  f  r7,  and  thus 


d  epo  _  -  t’P  o 

- - y-  wkr  +  d  - - f-  tann 

l-/> 0  ]  ~P0 


(Note  diat  taxn  is  not  the  same  as  since  laKg  is  a  function  of  die  Ringbus  loading.) 

Iliird,  die  mean  spacing  between  the  termination  of  one  Ringbus  request  and  die  arrival  of 


-J.  vVji  ^'Ap.0,  -  -  >  'A  - -A  S 
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the  next  Ringbiis  request  at  the  Mine  slice,  ~-P—  is  equal  to  — — ~-iau/i-  Ihcreforc 

1-po  'p 


g’_  = _ _  V 

<7  (l-^)r 


laMB  + 1 aRB 


0  '  'P)laMH  +  }P<aRB 


from  which  it  follows  that 


7min  f  (\+P)+NS<l 

‘cycle  — P'r1"  ;  _q 

8 ' 


(provided  tliat  gp  °*0). 


If  the  Ringbiis  throughput  is  saturated  when  lp  -0  (note  that  it  need  not  be  saturated  for 
small  enough  «//),  then  g'p  ° =gsal  and 

7 min  n-  .{\  i  fi)+NSd 

* cycle  -P >r  + - SZ., -  (4.3) 

g 

Note  that  while  g"  °  may  depend  on  p  and  <p.  g”'  docs  not.  Hence  equation  4.3  allows  the 
determination  of  as  a  function  of  p.  ip,  and  N  provided  that  the  Kingbus  remains  saturated 
for  7p-0. 

Note  that  equations  4.1  and  4.2  also  allow  the  results  obtained  in  this  section  for  (i -0  and 
ip  - 1  to  be  extrapolated  for  other  values  of  p  and  ip. 
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4.4  Simulation  II:  The  Actual  Concert  System 

In  this  section  we  present  the  results  of  a  series  of  simulations  of  the  actual  Concert  system, 
as  implemented,  in  order  to  give  some  idea  of  the  performance  of  this  system  and  how  it  varies 
under  the  influence  of  various  parameters.  The  simulation  model  and  die  manner  in  which  the 
simulations  were  performed  is  the  same  as  for  the  simulations  in  section  4.2  and  4.3.  All  die 
assumptions  and  parameters  arc  die  same  as  diosc  listed  in  section  4.3  except  for  die  following: 

Kinghus  arbitration  algorithm:  rotating  priority  (counterclockwise  priority  rotation)  as  in  Concert 
Access  paths:  asymmetrical  as  in  Concert 

Kinghus  destination  probabilities:  asymmetrical  (as  listed  in  fable  4.5).  We  take  diesc  probabilities 
as  asymmetrical  to  show  the  Kinghus  (with  asymmetrical  access  paths)  in  its  best  light  and  to 
correspond  to  the  expected  asymmetrical  bias  in  the  request  probabilities.  We  expect  that  most 
applications  will  be  structured  to  Pike  advantage  of  the  more  favourable  clockwise  direction  for 
accesses,  implying  an  asymmetrical  bias  in  the  request  probabilities. 

Kingbus  access  probability:  We  take  tf>-~.2,  .4  .  .6.  and  .8  to  illustrate  a  range  of  operating  condi¬ 
tions.  Note  that  the  performance  with  ^-0  (no  Kinghus  accesses)  is  given  by  the  isolated  Kingbus 
model  of  section  2.9  and  the  performance  with  -  1  (only  Kingbus  accesses)  is  given  by  die 
results  in  section  4.3. 

Arbiter  clock  period:  c-  200nscc. 

Multibus  access  time  distribution:  We  assume  a  deterministic  access  time  with  duration  l.lOpsec 
-5.5c  .  (We  arrived  at  this  duration  by  assuming  that  all  die  Multibus  accesses  of  a  slice  arc 
directed  towards  die  slice  global  memory  and  that  die  Kingbus  port  of  this  global  memory  is 
lightly  loaded.  In  the  actual  Concert  system,  the  mean  Multibus  access  time  of  slice  global  memory 
is  about  1.1  Op  see  when  die  Kingbus  port  is  heavily  loaded  and  about  l.OSpscc  when  the  Kingbus 
is  lightly  loaded.  (Sec  section  3.3  of  Appendix  A.)  Thus  our  assumed  l.lOpsec  duration  is  slightly 
pessimistic  for  most  eases.) 

As  before,  we  assume  the  start  up  time  is  zero  i.c.  lsltirl  0.  there  arc  no  long  word  accesses 
i.e.  /?  0.  die  Kingbus  data  transfer  time  is  deterministic  with  duration  </  7c,  and  the  Kingbus 
arbitration  time  is  deterministic  with  duration  tar(,  -  2c. 

ITic  simulation  results  are  listed  in  fables  4.9(a),  (b),  and  (c). 


N-~- 1 


Ip  -5.0c 


lp  =  10.0c 


20.0c 


50.0c 


ip  -  100.0c 


"  .2  ' 

.4 

.6 

.8 

hwlc1^1 

ll.88±.47 

14.52+.. 64 

19.35±.98 

24.3±1.6 

)W/C 

0.0 

0.0 

0.0 

0.0 

Wiaf/C 

2.50±  1.3 

5.73  ±1.26 

9.84±1.40 

12.4±1.8 

8 

1.22±.15 

1.94  ±.09 

2.28±.  12 

2.39±.12 

1  cycle ^ 

17.0±1.4 

18.82±.88 

22.06±  .6 1 

26.2±1.2 

iw/c 

0.0 

0.0 

0.0 

0.0 

\vKH/c 

1.49±.88 

3.71  ±1.03 

6,26±.94 

8.55+1.0 

8 

.83±.17 

1.52±.14 

1.98±.ll 

2.22+.09 

leveled 

26.5  ±1.2 

27.83±.89 

30. 1  ±  1 .4 

32.9+1.0 

lw/c 

0.0 

0.0 

0.0 

0.0 

wK„/c 

.84±.56 

1.80±.63 

3.03  ±.79 

4.56+1.4 

8 

.56+.06 

1.03  ±.09 

1.44  ±.07 

I.76+.07 

56..1±4.6 

58.8  ±3.2 

59.2±4.0 

61.4+3.1 

tw/c 

0.0 

0.0 

0.0 

0.0 

WKI,/C 

.30±.20 

.73  ±.48 

1.12±.22 

1.62+.24 

8 

.24  ±.03 

.48  ±.05 

.74  ±.06 

.94+.05 

lcvclv/c 

105.5±  10.7 

109.5±8.3 

111  .0±  7.3 

1 10.1+8.0 

lW/C 

0.0 

0.0 

0.0 

wK„/c 

.16±.15 

.28±.10 

,52±.22 

,79±.22 

8 

.14±.02 

.26±  .0-1 

.39  ±.04 
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-5.0c 

=■  10.0c 

-  20.0c 

50.0c 

100.0c 

200.0c 


N  -2 


.2 

.4 

.6 

.8 

W/<’/c 

16.33±.97 

24.b±2.2 

35.3±  1 .8 

48  2±4.1 

tw/c 

3.78±.40 

7.74±  1.13 

12.98  ±.90 

19.2±2.1 

wKH/c 

4.66±1.16 

10.4  ±2.4 

14.0±  1.6 

I6.8±  1.8 

X 

1.79±.09 

2.32±.16 

2.45±.  1 6 

2.39± .  14 

19. 69  ±.54 

25.8±  1.4 

36.2±  1.7 

47.8±3.8 

lw/c 

2.27  ±  .22 

4.94  ±.68 

9.65  ±.67 

14.9±  1.8 

W/iit/C 

3.50±.92 

8.09±  1.56 

12.9±  1.1 

15.6±2.4 

X 

1.47±.l  1 

2.20±.10 

2.37  ±.11 

2.42±.20 

twh'/c 

27.99±.78 

32.2±  1.2 

39.1  ±2.0 

49.1  ±1.4 

lw/c 

I.I8±.09 

2.34±.26 

4.85  ±1.1 3 

8.78  ±.69 

H/f/y/c 

1.88  ±.44 

4.95±  1.19 

8.80±l.48 

12.7±  1.0 

X 

1.03  ±.07 

1.77 +.06 

2.I9±.08 

2.35±.07 

hxeU^1 

57.9±2.5 

58.5±2.8 

6I.5±3.0 

65.8  ±2.1 

lw/c 

.42  ±.05 

.69  ±.09 

1 . 1 5  ± .  1 6 

1.97  ±.44 

*vKII/c 

.68±.27 

1.60±.27 

3.1!  ±.66 

4.9  3  ±.97 

X 

.50±.07 

.98  ±.06 

1.40±  07 

1.74±.05 

I07.3±5.8 

108  2± 5.2 

109.6±5.7 

1 1 1.6±6.1 

lw/c 

,22±.05 

.31  ±.07 

,44±.10 

6l±.10 

wKi,/c 

,33±.I7 

,84±.31 

l.29±.17 

l.96±.21 

X 

,28±.03 

,53±.04 

.79;t.06 

1.03  ±.05 

li-YClc//< 

It 

be 

208. 1±  10.2 

210.4±  16.0 

2I0.2±4.5 

i„A 

.  10±  .1)3 

.  1 5±  .07 

.20±.04 

,23±.06 

tvw///c 

.20  ±.07 

39±.14 

.63  ±.2  2 

,86±.13 

X 

.  i  4±  .0 1 

,28±.02 

.41  ±.04 

55±.01 

Table  4.9(b):  Concert  sinuilacion  results 
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Table  4.9(c):  Concert  simulation  results 


ivxamining  Tables  4.9(a),  (b),  and  (c)  we  can  see  that  the  mean  total  waiting  time  (or  wasted 
time)  per  cycle  -  given  by  lw  +  ^„Mli  *  can  be  quite  large,  Ihis  waiting  time  is  largest,  naturally, 
for  a  given  set  of  parameters  when  the  overall  throughput,  and  the  Ringbus  throughput  in  particu¬ 
lar,  is  saturated.  We  can  derive  a  necessary  condition  for  the  saturation  of  the  Ringbus  throughput 
as  follows. 

First,  the  overall  throughput  must  be  above  the  "knee  point".  Referring  to  the  approxima¬ 
tion  in  equation  4.1,  the  overall  throughput  is  above  the  knee  point  when 
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<,♦«■  -*Au,  *  ■= 


(Recall  that  tp  HWr  +  (( I  f-'plaKH)  <md  laKli>tuk‘ir))  Second,  g''  °  must  equal 

£ ,  Therefore  a  necessary  condition  for  die  saturation  of  the  Ringbus  is 


Ip  +  ((i  'P)taMH  +  ilaR0Hm))  < 


or  on  rearranging 


‘p  +  ^aMR  <  V4  ~T  +  laMR  ~  taRR™  *) 


In  Table  4.10  we  list  this  inequality  for  various  values  of  N  and  destination  probabilities  in 
the  actual  Concert  system  (i.c.  8  slices,  rotating  priority  algorithm,  and  asymmetrical  access  paths) 

with  p  0.  Note  that  d  -  9c.  iumr  -5.5 c,  and  ’-10.5c  (from  Appendix  A)  independent  of 

die  destination  probabilities. 


Destination  Trobs. 


Asymmetrical 


Necessary  condition  for  saturation 


■v  5.5<25* 


+  5.5<55\p 


—  v-5.5<  1 15^ 

c 


—  v-5.5<J3^ 

c 

—  +  5.5<71^ 
c 


>5.5<146* 


—  5.5  <4Jt^ 
c 


+  5.5<9I* 


^  5.5<  187«^ 


Table  4.10:  Necessary  conditions  for  Ringbus  saturation  in  the  actual  Concert  system 


Operation  in  the  saturated  region  of  diroughput  is  undesirable  because  of  the  associated 
large  waiting  limes.  Inequality  4.6  provides  a  means  to  adjust  parameters  to  possibly  avoid  opera¬ 
tion  in  this  region. 
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Conclusions 


Since  conclusions  specific  to  the  Multibus  and  Ringbus  have  already  been  presented,  this 
chapter  covers  the  general  conclusions  that  can  be  drawn  from  the  research  in  this  tome. 

We  can  now  address  the  three  questions  raised  in  the  Introduction. 

What  is  the  performance  of  the  Concert  Multiprocessor? 

We  can  still  not  answer  this  question  directly  because  the  performance  depends  on  the 
models  employed  (which  may  be  dictated  by  the  application  programs)  and  Lhe  model  parameters 
(which  certainly  depend  on  the  application  programs).  However,  we  have  developed  techniques  to 
determine  the  performance.  Assuming  the  simple  processor  model  presented  in  Chapter  1,  we 
have  shown  how  to  determine  analytically  the  performance,  using  throughput  as  the  metric,  for 
any  Concert-like  system.  This  analytical  approach  involves  decomposing  the  overall  system  into 
Multibus  and  Ringbus  subsystems,  which  may  be  modeled  in  isolation  using  the  models  formu¬ 
lated  in  Chapter  2  and  3.  and  then  integrating  these  models,  using  the  procedure  in  section  4.22.  to 
determine  the  throughput.  The  integration  procedure  is  in  fact  an  approximation  based  on  match¬ 
ing  the  first  moments  of  the  interactions  between  the  Multibus  and  Ringbus  models.  More  accu¬ 
rate  results  tlian  this  procedure  yields  can  be  obtained  via  simulation.  Simulation  is  also  the  pre¬ 
ferred  method  to  include  features  which  arc  difficult  or  cumbersome  to  handle  in  the  analytical 
models  and  to  allow  sizes  -  such  as  eight  slices  -  that  arc  too  complex  for  the  analytical  approach. 

The  performance  of  the  actual  Concert  system  with  eight  slices  has  been  established  for 
some  different  parameter  sets  by  the  simulation  results  presented  in  section  4.4. 


Why  is  the  performance  as  it  is?  What  factors  influence  the  performance? 

l  he  performance  of  Concert,  as  modeled  in  this  thesis,  depends  critically  on  the  parameters 


270 


Conclusions 


of  the  simple  processor  model.  The  performance  is  especially  sensitive  to  the  mean  processing 
time,  ip,  and  the  probability  of  a  Kingbus  access,  i/c 

t  he  effect  of  dilTcrent  Kingbns  architectures  and  different  Kingbus  arbiter  algorithms  on  the 
performance  is  small  except  when  the  Kingbus  is  heavily  loaded,  in  which  case  these  factors  can  be 
significant 

I  low  can  the  performance  he  improved? 

There  iire  two  orthogonal  ways  in  which  the  performance  can  be  improved: 

1)  change  the  physical  structure,  or 

2)  change  the  input  parameters  i.c.  change  the  characteristics  of  the  application  pro¬ 
grams. 

flic  more  obvious  changes  in  physical  structure  have  already  been  discussed  in  the  conclu¬ 
sions  of  Chapter  2  and  3.  An  important  part  of  the  work  in  this  diesis  has  been  establishing  the 
ultimate  performance  dial  can  be  attained  with  Ringbus-fike  schemes. 

file  desirable  changes  in  the  input  parameters  are  again  rather  obvious:  locali/c  the  process¬ 
ing  as  much  as  possible.  However,  the  work  in  diis  thesis  enables  the  quantification  of  the  perfor¬ 
mance  improvement  resulting  from  any  change  in  the  input  parameters.  Such  quantification  is 
important:  it  serves  as  a  directional  derivative  in  the  performance-action  space. 

One  activity  is  still  required  to  complete  the  first  cycle  in  the  iterative  process  of  performance 
modeling:  a  comparison  of  die  predicted  performance,  based  on  the  simple  piocessor  with  param¬ 
eters  obtained  from  actual  programs,  with  the  actual  performance  obtained  with  the  same  pro¬ 
grams.  The  purpose  of  such  a  comparison  is  to  establish  where  the  processor  model  and  other 
models  need  the  most  improvement  and  perhaps  how  to  improve  them.  Certainly,  the  processor 
model  needs  to  be  more  specific  and  more  oriented  to  the  application  program.  As  discussed  in 
Chapter  2,  higher  level  models  should  be  considered  in  future  cycles  of  the  modeling  effort. 
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Measurement  Details 


This  appendix  describes  how  actual  mcasurcmojits  of  processing,  recovery,  access,  and  wait¬ 
ing  limes  were  obtained,  these  terms  arc  defined  in  section  2  (as  well  as  in  the  main  text)  for 
convenience.  Measured  access  times  under  different  conditions  arc  given  in  section  4. 

I.  Background 

three  typos  of  Multibus  and  Ringbus  accesses  may  occur:  byte  (8  bits),  word  (l(>  bits),  and 
long  word  (32  bits).  A  word  access  consists  of  two  simultaneous  byte  accesses  (a  high  byte  and  a 
low  byte).  Consequently,  byte  and  word  accesses  are  indistinguishable  to  an  ob::ei  \ci  of  the  Mul¬ 
tibus  or  Kingbus  unless  the  observer  examines  'he  lllll  N*  (byte  high  enable)  signal  on  the  Mul¬ 
tibus  (see  the  Multibus  796  specification  |l'4|  tor  details)  or  the  II YTi/WOR I)  signal  on  the 
Kingbus  (sec  Anderson  |A2)  fot  details).  In  panicul.tr.  a  byte  and  a  word  have  the  same  access 
time  distribution.  A  long  word  access  consists  of  two  consecutive  word  accesses  (since  the  Mul¬ 
tibus  and  Kingbus  are  16  bits  wide). 

l  iming  diagrams  for  the  three  types  of  accesses  are  given  in  figures  A.  I  and  A. 2.  The 
diagrams  depict  the  essential  features  of  the  Multibus  operation  from  the  point  of  view  of  the  pro¬ 
cessor  originating  the  access.  I  he  relative  duration  and  timing  of  the  signals  shown  is  only 
approximate.  HR  IQ  *  and  HRRN  *  icfer  to  the  Multibus  request  and  grant  signals  foi  the  ori¬ 
ginating  processor:  MR  DC*  and  MW'TC*  rclcr  to  the  Multibus  read  and  write  signals  respec¬ 
tively;  and  SACK  *  refers  to  the  Multibus  acknow  ledge  signal. 


t  We  '\MiiHt:  th  n  iln-  link-  dilTorcnre'  bo! wren  di  nodm;1  .-  h\u-  mill  a  ward  access  is  neeliL'ii'le  Me.iMuenn'tils 
nude  mi  ill!-  Multibus  ami  Kiin-hn-  willi  only  a  soiy'i;  poll  oi  ilk  dual  pis  manors  loaded  support  ibis  losump- 
linr  (see  section  4)  We  see  no  reason  lor  our  lindinp  "i  this  m.uur  lo  chant's  -slum  both  memory  poll'  me 
loaded 
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Note  from  figures  A.2(a)  and  A.2(b)  that  control  of  the  Multibus  (and  hence  of  the 
Kingbus)  is  relinquished  between  the  successive  word  access  of  a  long  word  access. 

2.  Definitions 

Ail  the  following  quantities  arc  defined  with  respect  to  the  rising  edges  of  /{(  I  K*  (the  Mul¬ 
tibus  clock  signal).  I  .cl  the  Multibus  request  and  grant  signals  for  processor  /  be  denoted  by 
liK I'Qj *  and  HRRNj*  respectively. 

The  processing  time,  denoted  by  lp.  for  some  processor  /  is  the  interval  between  the  first  ris¬ 
ing  edge  of  /{('/. K *  after  HRi'Q  ,*  goes  high  at  the  end  of  an  access  to  the  first  rising  edge  of 
liCl.K*  alter  HR KQ, *  next  goes  low.  In  the  case  of  a  long  word  access,  the  end  of  the  second 
word  access  is  inlcrpietcd  as  the  end  of  the  long  word  access.  Thus  the  interval  between  the  two 
successive  word  accesses  of  a  long  word  is  not  called  a  processing  time. 

The  access  time,  denoted  by  /„ ,  for  the  access  of  some  processor  i  is  the  interval  between  the 
first  rising  edge  before  li/’RN,*  goes  low  to  the  first  rising  edge  of  IK  f.K*  after  HRi'Q  *  goes 
high. 

The  recovery  time,  denoted  by  ir.  for  the  long  word  access  of  some  processor  /  is  the  inter¬ 
val  between  the  first  rising  edge  of  IK  I  K  *  alter  HR KQ, *  goes  high  at  the  end  of  the  first  word 
access  to  the  first  rising  edge  of  HCI.K*  after  HRI’Q  *  goes  low  for  the  second  word  of  die  long 
word  access. 

The  waiting  lime,  denoted  by  /„,.  for  any  request  for  use  of  the  Multibus  by  processor  /  is 
the  interval  between  the  first  rising  edge  of  HCI K*  after  HRI'Q  *  goes  low  to  the  first  rising  edge.  -  — 
of  IK  I.K*  before  HPRN*  goes  low.  In  the  absence  of  any  other  traffic  on  the  Multibus,  there  is 
always  exactly  one  rising  edge  of  HCI.K*  after  HRi'Q ,*  goes  low  and  before  HPRN*  goes  low, 
yielding  (w  =  0. 

The  above  definitions  were  chosen  so  as  to  meet  the  following  two  constraints:  I)  the  access 
time  must  include  the  total  time  that  Multibus  resources  are  allocated  to  a  particular  processor, 
and  2)  the  waiting  time  must  be  zero  for  a  single  processor  on  a  Multibus.  The  time  that  Mul¬ 
tibus  resources  arc  allocated  to  a  processor  is  determined  by  the  Multibus  arbiter  which  is  a  small 
finite  state  machine  clocked  on  the  rising  edge  of  IK  I  K*.  We  chose  to  regard  the  alloc  ation  of 
Multibus  resources  to  be  decided  on  the  rising  edge  of  HCI.K*.  This  view  is  not  unique:  we  could 
have  just  as  well  chosen  the  Multibus  resources  to  be  allocc’ed  on  the  edges  of  HI'RN.*.  How¬ 
ever.  our  choice  has  three  adv  antages:  1)  the  allocation  instants  arc  easily  dcinatcalcd  by  IK  I  K*. 

2)  the  wailing  time  can  be  defined  so  that  it  is  easily  demarcated  by  IK  l  K*  (so  it  is  easy  to 
measure)  aiad  it  is  /cm  for  .i  single  processor,  and  1)  nan  bardwaic  monitor  (the  DSD.  s<v  the 
next  section)  also  samples  all  signals  on  the  n-iiag  edge  of  IlCI  K*.  Note  that  our  definition  of 
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.kxosn  lime  includes  the  del.iv  of  the  Multibus  arbiter.  This  must  be  the  case  if  we  are  to  meet  our 
first  constraint  since  any  delay  effectively  increases  the  duration  of  any  allocation  of  resources. 

I  lie  previous  definitions  arc  depicted  in  f  igure  A. 3. 
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Figure  A. 3:  Illustration  of  definitions 

flic  measurements  presented  in  section  4  indicate  dial  there  is  little  difference  in  access 
times  for  reads  and  writes;  thus  the  MRDC*  and  Mll'l'C*  lines  arc  omitted  from  Figure  A. 3. 

3.  Time  Measurements 

All  the  measurements  reported  in  this  section  were  taken  with  a  digital  logic  analy/cr  with  10 
nsec  clock  resolution*  according  to  the  definitions  given  in  section  2.  The  measurements  were 
performed  on  three  slices  of  the  Concert  system  connected  by  the  Ringbus  with  a  Ringbus  arbiter 
clock  (//  '/. A  )  period  of  200  nsec.  All  the  pMX.cs-.or  and  memory  boards  were  Microbar  DI3C68K 
and  DHR50  models  respectively  with  all  options  set  as  listed  in  Appendix  C.  All  the  measure¬ 
ments  were  repeated  for  several  different  processor-memory  pairs  on  different  slices.  No  notice¬ 
able  differences  in  the  measurements  for  the  different  repetitions  were  observed,  thus  we  present 
the  following  measurements  as  if  only  one  set  of  measurements  were  taken  for  each  case. 


t  A  Gould  Ihnmauon  MuO-H  Dipii.il  topic  Analy/cr 
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.1.1  Minimum  Processing  Time 

Hxecuting  Uic  assembly  language  program 
loop:  bra  loop 

(corresponding  to  the  single  instruction  word  60fe)  from  a  non-local  memory  gives  the  smallest 
possible  processing  time.  The  processing  time  in  this  case  is  the  time  it  takes  to  decode  the 
instruction  word  60fe  and  initiate  die  fetch  for  the  next  instruction.  The  minimum  observed  pro¬ 
cessing  time  in  this  ease  was  600  nsec;  die  processing  times  varied  almost  uniformly  from  600  nsec 
to  900  nsec  (in  100  nsec  steps  since  die  time  is  measured  with  respect  to  IK  I  K*  rising  edges). 

To  determine  the  smallest  possible  processing  time  for  a  program  executing  out  of  local 
memory  we  ran  die  following  assembly  language  program: 
loop:  niovb  a4(h\  a5 Qi' 


movb  a u5&! 


bra  loop 

1  lie  movb  a4(h,  ;d("'  instruction  reads  the  byte  at  the  address  stored  in  address  register  a4 
and  writes  the  byte  at  the  address  stored  in  address  register  a 5.  We  stored  the  loop  containing  the 
movb  instructions  in  a  processor’s  local  USB  memory,  installed  non-local  addresses  in  address 
registers  n4  and  a5,  and  measured  the  minimum  ptocessing  time  of  die  movb  instruction.  There 
arc  actually  two  different  processing  times  associated  with  the  movb  a4(ft,a5(h  instruction:  die 
interval  between  completion  of  the  byte  read  and  initiation  of  the  byte  write  within  one  movb 
a4(«,  a5(P  instruction  and  also  the  interval  between  the  completion  of  the  byte  write  of  one  movb 
a5(h',  a4(h  instruction  and  the  initiation  of  die  byte  read  of  die  following  movb  a4 (fi\  a5(?f  instruc¬ 
tion.  S  he  intra-instruction  processing  time  (i.c.  the  former  of  the  two  processing  times  just  men¬ 
tioned)  was  600  nsec  about  half  the  time  and  700  nsec  the  other  half.  The  inter-instruction  pro¬ 
cessing  time  (i.c.  the  latter  of  the  two  processing  times)  varied  from  1.20  to  1.40  psec. 

We  also  considered  (lie  minimum  processing  time  of  a  program  executing  out  of  non-local 
memory  subject  to  the  restriction  of  one  non-local  memory  access  per  instruction.  To  determine 
diis  minimum,  we  ran  the  following  assembly  language  program: 
loop:  movb  d7,  a5 <h’ 


movb  d7.  a5("’ 
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bra  loop 

The  movb  d7,  a5 Qi'  instruction  writes  die  byte  in  data  register  d?  to  the  address  contained  in 
address  register  a5.  We  stored  die  loop  containing  the  movb  instructions  in  a  processor’s  local 
HSU  memory,  installed  a  non-local  address  in  address  register  a5,  and  measured  the  processing 
time  of  the  movb  instruction.  This  processing  time  consists  of  the  time  to  fetch  the  single  word 
movb  instruction,  decode  it,  and  initiate  die  byte  write  on  the  Multibus.  The  processing  lime  of 
the  movb  d7.  a5("  instruction  was  consistently  1.50  jtscc.  We  also  tried  the  movb  a5 Qi\  d7  instruc¬ 
tion  corresponding  to  a  byte  read,  and  also  measured  1.50  /tsce. 

3.2  llecmery  Time 

i  lie  distribution  of  recovery  lime  between  the  successive  word  accesses  of  a  long  word  access 
was  the  same  for  reads  and  writes:  approximately  half  of  the  lime  the  recovery  time  was  600  nsec 
and  die  other  half  of  die  time  it  was  700  nsec,  yielding  a  mean  of  650  nsec. 

3.3  Access  l  ime 

Since  all  die  memory  boards  are  dual  ported  we  have  to  consider  the  effect  of  traffic  on  one 
port  of  a  memory  board  on  the  access  time  via  die  other  port.  In  all  cases  we  found  no  difference 
in  the  access  time  disliibutions  for  bytes  anil  words  and  in  the  access  time  distributions  for  die  two 
words  of  a  long  word  access. 

3.3.1  Multibus  Access  Time 

3.3.1. 1  Multibus  Access  Time  with  Other  Memory  Port  Unloaded 

In  this  ease  the  access  time  distribution  was  approximately  the  same  for  reads  and  writes, 
with  a  minimum  access  time  of  1.00  /iscc  and  a  maximum  of  1.30  /iscc.  flic  actual  observed  distri¬ 
butions  are  given  in  figure  A.4  below. 
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Figure  A.4(a):  Multibus  road  access  time  -  other  port  unloaded 
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Figure  A.4(l>):  Multibus  write  access  time  -  other  port  unloaded 


3.3.1. 2  Multibus  Access  Time  with  Other  Memory  Port  landed 


We  considered  two  situations:  1)  accessing  the  local  memory  of  another  processor  via  the 
Multibus  while  that  processor  is  loading  the  MSB  port  of  the  memory,  and  2)  accessing  the  global 
memory  of  a  slice  via  the  Multibus  while  other  processors  access  it  via  the  Kingbus. 


1)  Accessing  the  local  memory  of  another  processor: 


We  loaded  the  I ISB  port  of  the  local  memory  by  having  the  associated  processor  execute 
loop:  bra  loop 

out  of  the  local  memory.  We  observed  no  noticeable  difference  between  the  access  time  distribu¬ 
tion  for  reads  and  writes  via  the  Multibus.  As  indicated  in  Figure  A.5,  the  access  times  varied 
from  1.00  pscc  to  1,80  /iscc. 
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Figure  A.5:  Multibus  access  time  -  USB  port  loaded 


2)  Accessing  llie  global  memory  of  the  slice: 

We  loaded  the  Ringbus  port  of  the  slice  global  memory  with  3  processors  on  another  slice 
and  2  processors  on  yet  another  slice  all  executing 
loop:  bra  loop 

out  of  the  first  slice's  global  memory  (i.c.  over  the  Ringbus).  Figure  A.6  shows  the  resulting  access 
time  distribution  for  Multibus  accesses  to  the  slice  global  memory.  We  observed  no  noticeable 
difference  in  the  distribution  between  reads  and  writes. 
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Figure  A.6:  Multibus  access  time  of  slice  global  memory  -  Ringbus  port  loaded 
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3.3.2  Kinghus  Access  Time 

Figure  A.7  depicts  a  Kinghus  read  access  (byte  or  word)  combining  die  points  of  view  of  the 
originating  processor  and  the  Kingluis.  for  a  single  processor  in  a  slice. 


f  igure  A.7:  lypicel  Kiugbus  read  access 

KFQ*  is  the  Kinghus  request  signal  for  the  slice;  it  indicates  that  the  Kill  has  detected  an 
access  that  requires  the  Kinghus.  KNM*  (short  for  enable  Multibus)  is  die  Kinghus  grant  signal 
for  the  slice;  it  indicates  that  the  Kinghus  request  has  been  allocated  the  necessary  Kinghus  seg¬ 
ments.  I  .Cl  .K  is  die  Kinghus  arbiter  clock  signal.  The  Multibus  and  die  Kinghus  operate  asyn¬ 
chronously  with  respect  to  each  other,  dins  I1CI.K*  and  I  .Cl  .K  arc  not  synchronized.  Since  KFQ* 
is  generated  from  Multibus  signals,  it  is  not  synchronous  with  I.C’I.K.  On  die  other  hand.  FNM* 
is  generated  by  the  arbiter  so  it  is  synchronous  with  l.CI.K. 

We  define  a  number  of  quantities  with  respect  to  die  diagram  in  Figure  A.7  as  follows: 
ta  is  the  Kinghus  access  time  (as  defined  earlier) 

i(hom)oKI  ^  <s  die  normal  lime  from  initiation  of  a  Kinghus  read  access1  to  generation  of  a 
Kinghus  request.  We  discuss  shortly  what  normal  means  in  this  context. 

is  the  interval  between  the  generation  of  a  Kinghus  request  and  the  arbiter  latching  in 
on  a  rising  I.C'I.K  edge. 

I  start  is  the  overhead  associated  with  the  start  of  a  Kinghus  access.  It  is  the  interval  from  the 
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initiation  of  die  access^  to  tltc  latching  of  the  Ringbus  request  by  tlic  arbiter. 
tarb  is  the  arbitration  delay. 

'/runs  is  the  data  transfer  time.  It  is  die  interval  from  the  end  of  die  arbitration  delay  to  the 
termination  of  the  access.  ^  Note  that  in  actuality,  data  transfers  on  die  Ringbus  occur  in  die 
interval  between  the  end  of  the  arbitration  delay  and  die  falling  edge  of  XACK*.  ’Ihus  trruns 
should  be  interpreted  as  the  total  interval  in  which  a  data  transfer  could  occur,  not  as  die 
interval  in  which  it  docs  actually  occur. 

Finally,  d  is  the  total  duration  for  which  segments  arc  allocated  to  a  Ringbus  requesL 


The  observed  means  of  these  quantities  with  one  processor  in  a  slice  are  as  follows: 

1„  ---  2.17  fiscc  (We  present  a  histogram  of  the  ;tcccss  time  in  section  3.3.2. 1.) 

'(norm)  =  230  nsCC 
t latch  150  nsec 

Taari  -  380  nsec 

t trims  —  1  -38  fl SCC 

d  --  9.1  arbiter  clock  periods  i.e.  1.82/tsec.  ('I he  arbiter  clock  period  was  200  nsec  for  all  die 
measurements  reported  in  this  Appendix,  as  mentioned  earlier.)  d  was  cidicr  9  or  10  arbiter 
clock  periods. 

Since  die  Multibus  signals  arc  asynchronous  with  respect  to  die  arbiter  clock,  one  would 
expect  (inn),  to  be  half  an  arbiter  clock  period,  i.e.  100  nsec.  In  actual  fact  it  is  a  little  more  than 
diis  (as  can  be  seen  above)  due  to  die  delay  contributed  by  a  preliminary  sampling  stage  incor¬ 
porated  in  the  arbiter  to  inhibit  metastabiiily  in  the  final  sampling  of  RKQ*.  (//0,t./,  is  measured 
with  respect  to  this  final  sampling.) 

'Ihc  start  overhead,  lUlir, .  and  the  access  time.  vary  with  the  spacing  between  the  termina¬ 
tion  and  initiation  of  successive  Ringbus  accesses  generated  by  the  slice  in  which  the  processor  is 
located.  (Of  course.  in  also  varies  with  die  rate  and  type  of  Ringbus  accesses  generated  by  other 
slices.)  In  section  2.9.2  we  defined  this  spacing  to  be  the  processing  time  of  die  single  processor 
equivalent  of  this  slice  and  we  denoted  it  by  Figure  A.8  depicts  die  same  signals  as  in  Fig¬ 

ure  A.7  for  a  Ringbus  read  access  with  ip1Hcqv--0.  (litis  was  achieved  with  only  two  processors  on 
a  slice,  each  executing 

loop:  bra  loop 

f  Hy  initiation  and  termination  of  Ihc  access  we  mc-.m  the  first  rising  edge  of  IK 'I  K*  before  BI’RN*  goes  low 
and  the  first  rising  edge  of  HCI.K*  Ik  fore  lll’RN  goes  high  respectively,  as  defined  in  section  2  of  this  Appen¬ 
dix. 
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out  of  the  global  memory  located  in  some  other  slice.) 
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Figure  A.8:  Typical  Ringbus  read  access  with  ipfHeqv=0 


Ihe  reason  that  (slar,  and  ia  vary  with  is  dial  the  F.NM*  signal  remains  active  for  a 

period  after  the  termination  of  a  Ringbus  access.  If  ijj-lHcq v  js  sm 

all  enough,  as  in  Figure  A.8,  the 

FNM*  signal  remains  active  past  the  initiation  of  the  next  Ringbus  access  and  causes  a  delay  in 
the  assertion  of  the  RFQ*  signal  (since  the  RFQ*  signal  for  the  present  accesses  cannot  be 
asserted  until  the  FNM*  signal  for  the  previous  access  has  been  disasserlcd).  We  define 
'{'Jorm)01*1  ^  U>  he  die  time  from  the  initiation  of  a  Ringbus  access  to  die  assertion  of  RFQ*  if 
RFQ*  is  not  delayed  by  the  FNM*  from  die  previous  cycle.  Thus 
I Hl'RNtoRi.Q  _ ijiPRNtoRl-.Q  f  The  duration  for  which  FNM*  remains  active  past  die  termi¬ 
nation  of  the  previous  access  is  + <lprcv  -  i£rcv-  dp"  v  -  ,  where  the  superscript  prev 

denotes  the  quanti.y  in  die  previous  access.  Thus 
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If  /</;/„,  >0  then,  ignoring  a  small  propagation  delay,  RF.Q*  is  asserted  at  the  same  time  that 
F.NM*  from  the  previous  access  is  disasserted.  Thus  lsltirl  equals  t plus  one  arbiter 
period,  Therefore 


‘start  — 


\  l(nom)'RI:Q  +  t latch  '  f  ‘delay  =  0 

I  l*££Stf*KQ  +  ‘delay  +  200 nsec  if  tdeU,y  >0 


where  ttalci,  is  the  latch  time  when  tdeiay-0.  Finally  la  =tmrt-i- ‘arb  +  ‘irons-  Measurements 
revealed  dial  the  distribution  of  i,rans  is  approximately  the  same  (fairly  uniform  over  the  interval 
1.30  to  1.45  /xsec)  regardless  of  ipHeqv( although  its  mean  is  slightly  different  for  reads  and  writes). 
Thus  lSMr,  is  the  sole  contributor  to  the  change  in  l0  as  tpBci ,v  varies.  Thus 


, lll’RNtoRKQ  ,  ,  ,  .  ,  ,  if,  ,  _ a 

Unarm)  + ‘latch  *  ( arb  ^  l trims  "  'delay  ^ 


.IWRNtoRKQ 
‘(norm ) 


+  ‘arb  +  ‘ trans  +  ‘ delay  200 1 1  see 


'  I  ‘delay 


or  simply 


‘ a  " 


tf0m)  i  f‘delay-0 
‘a  ‘latch  r  ‘ delay  200/ISW 


‘delay  **  0 


where  i‘nom,)  is  die  access  time  if  ldciay  -0. 

In  section  3.9  we  defined  the  spacing  between  the  completion  of  a  Ringbus  grant  and  die 
initiation  of  die  next  Ringbus  gram  from  the  same  slice,  excluding  die  waiting  time  of  the  Ringbus 
requests,  to  be  the  Ringbus  equivalent  processing  time  which  we  denoted  by  ip!ltt‘v. 

If  i(May  =0,  tp,,n,v  is  plus  lstar,  and  tlirt,  and  less  die  duration  for  which  i.NM* 

remains  active  past  die  termination  of  the  previous  access.  If  idciay> 0.  tpBrqv  is  three  arbiter 
clock  periods,  i.c. 


.RBcqv  _ 
7> 


'star,  +  I, arb  -{‘521  +  hart  +  f  >D  if  ‘delay  -0 

600 nsec  if  tjrlaV>0 


Note  that  if  ‘delay  -0.  then  +  and 

‘start  ^  l(nonn)ORI  Q  +  ‘hitch-  yielding 


‘ latch  ‘ arb  '  f  ‘delay  —  0 

600//SCC  if  t delay  > 0- 


Now  lia/ch  >0  and  tlirt,  -2  arbiter  clock  periods,  thus  ipRcqv> 2  or  3  arbiter  clock  periods. 
The  observed  means  of  the  quantities  in  Figure  A.8  (read  accesses  with  lp,l,c<‘v=0)  arc 
la  =2.47jisec 
90  nsec 
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1  irons  —  1-38  /iSCC 

d  -  9  arbiter  clock  periods  i.e.  1.80/isec  (the  duration  was  consistently  9  clock  periods.) 

The  situation  just  discussed  for  Ringbus  read  accesses  is  similar  for  Kinghus  write  accesses. 
The  observed  means  for  write  accesses  arc  summarized  in  Table  A.l. 


large 

,MHcqv_Q 

t(l 

-HI’liSloRIQ 
'(norm ) 

t  start 

t  irons 

d 

2.13  /isce 

230  nsec 

380  nsec 

1.35  pscc 

9.6X200  nsec 

2.58  /usee 

n/a 

830  nsec 

1.35  /isec 

9.5X200  nsec 

Table  A.l 


These  figures  reveal  several  things.  T'irst,  for  large  lpll>l‘QV,  l(norm)0lil^ •  and  7sl„r(  arc  the 
same  for  reads  and  writes  while  Ta  is  slightly  less  for  writes  than  for  reads.  Second,  for  0, 

both  lsMrt  and  7a  arc  larger  for  successive  write  accesses  than  for  successive  read  accesses.  I  bis  is 
due  to  the  fact  dial  li.S'M*  remains  active  for  a  longer  interval  after  the  termination  of  a  write 
access  than  alter  the  termination  of  a  read  access.  Thus  the  access  time  of  a  read  or  write  depends 
on  the  type  of  access  preceding  it.  We  only  investigated  cases  with  reads  preceding  reads  and 
writes  preceding  writes,  ’third,  <7  is  slightly  greater  for  write  accesses  than  for  read  accesses  for 
both  large  and  lplllc,,v -0. 

3.3.2.I  Kinghus  Access  Time  with  other  Memory  Port  Unloaded 

The  observed  access  time  distribution  for  this  ease  for  reads  and  writes  and  for  large 
and  ip,,fafv~0  arc  given  in  f  igure  A.9. 
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Figure  A. 0(d):  Kingbus  write  access  time  distribution  -  - 0 

Note  that  ijj1,,c<iv  does  not  have  much  cfTcct  on  the  distributions  except  for  a  horizontal  shift 
reflecting  the  larger  mean.  The  horizontal  shift  is  indicative  of  the  duration  for  which  FNM* 
remains  active  past  the  termination  of  the  previous  access.  As  mentioned  earlier,  this  duration, 
and  hence  the  mean,  depends  on  the  type  of  access  preceding  the  observed  access.  It  seems  that 
most  of  tite  randomness  in  the  kingbus  access  time,  as  least  for  reads  preceded  by  reads  and 
writes  preceded  by  writes,  is  due  to  the  random  arrivals  of  the  kliQ*  signal  with  respect  to  the 
arbiter  clock. 

.13.2.2  kingbus  Access  Time  with  Other  Memory  Port  Loaded 

We  loaded  the  Multibus  port  of  a  slice  global  memory  with  four  processors  executing 
loop:  bra  loop 

out  of  the  slice  global  memory  on  the  Multibus.  We  observed  the  access  times  for  accesses  to  that 
same  global  memory  over  die  kingbus  from  anolhei  slice.  No  significant  difference  was  observed 
in  the  access  time  distribution  for  reads  and  writes  for  large  iplll,,iv.  Our  observations  for  large 
jMIhiiv  afC  summarized  jf1  Figure  A.  10. 
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figure  A.  10:  Kinghus  access  time  distribution  -  //)w/,r</v  large  and  other  port  loaded 

l;or  ip,nivy  -0.  the  access  time  distributions  are  similar,  except  for  a  horizontal  shift.  Note  that  for 
small  enough  tp'H“!v,  the  distribution,  through  the  mean,  docs  vary  between  the  type  of  access 
observed  and  die  type  of  access  preceding  it. 

3.4  Access  Times:  General  Observations 

•  The  access  time  distribution  is  approximately  the  same  for  Multibus  read  and  write  accesses. 

•  The  mean  Multibus  access  lime  varies  from  1.05  pscc  to  about  1.2  /iscc  depending  on  the 
loading  on  die  other  memory  port 

•  The  access  time  distribution  for  Kinghus  accesses  depends  on  four  factors: 

1 )  the  type  of  access. 

2)  the  type  of  the  preceding  access. 

3)  the  value  of  i^ncqv,  and 

4)  die  loading  on  the  other  port  of  the  access's  memory, 

flic  second  factor  is  only  relevant  when  is  small. 

•  'ITic  mean  Kinghus  access  time  varies  from  about  2.13  psec  to  about  2.58  pscc  depending  on 
the  above  four  factors. 

•  Hie  tail  of  the  access  time  distribution  increases  as  the  loading  on  the  other  port  of  the 
itcccssed  memory  increases. 

•  Reads  and  writes  have  a  Ringbus  grant  duration  (i.c.  duration  for  which  segments  are  allo¬ 
cated)  of  9  or  10  arbiter  clock  periods. 

•  fMHeqv  cannot  foc  |css  t|lan  2  or  3  arbiter  clock  periods. 
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3.5  Head* Modify- Writ c  Access  '1'ime 

A  test  and  set  instruction  has  an  access  time  of  about  2.60  /z see  to  2.70  /zsee  on  tiic  Multibus 
and  an  access  time  of  about  4.J0  /zsee  on  the  Koigbus.  (  These  figures  arc  with  the  other  memory 
port  unloaded  in  each  ease).  We  did  not  determine  distributions  for  these  two  cases.  For  a 
Ringbus  access,  die  segments  are  allocated  to  die  access  for  about  19  arbiter  clock  periods. 


289 


Appendix  B 


In  this  appendix  we  present  the  prcxifs  for  the  various  lemmas  and  Theorems  which  would 
have  hindered  Die  flow  of  presentation  if  they  had  been  included  in  the  main  text. 

Theorem  2.1 

With  independent  identical  processors  with  deterministic  processing  lime  (p  and  deterministic 
access  time  ta  served  by  a  single  bus  in  l-'CT'S  order,  the  waiting  time  per  request  after  at  most  two 
cycles  of  every  processor  is  the  same  for  every  request.  Moreover,  after  at  most  two  cycles  of 
every  processor  the  l-'Cl-S  queue  is  either  always  empty  or  always  nonempty  at  the  instant  a 
request  arrives  at  the  queue. 


Proof: 

l  et  there  be  N  processors  denoted  by  0.  I.  2,  3,  •  •  • ,  N  - 1.  Ian  /,(«)  denote  the  time 
at  which  processor  i  makes  its  n'h  request  for  the  bus  (i.e.  the  instant  that  processor  fs 
n'h  request  arrives  at  the  end  of  the  queue).  Let  w,(n)  denote  the  wailing  time  of 
processor  fs  n,h  request.  To  simplify  the  presentation  we  choose  to  interpret  /,•(«) 
and  Wj(n)  as  lj  moj  u(n  + 1  //  A/|)  and  w,  irWfi  m(ii  I  \i/  A/|)  respectively  whenever 
/< 0  or  i>N.  We  tltcn  have 

/,(«  1  )=/,-(«)  f  Wj(n )  y  l„  -hlp  (2.1.1) 

Without  loss  of  generality  start  counting  the  requests  made  by  each  processor  at  the 
first  instant  at  which  all  N  processors  have  made  at  least  one  request  for  the  bus  and 
let  the  initial  condition  be 


toO)  <  MO)  <  '2<0  <  <  'Ar-iO). 


with  tics  being  broken  by  the  l-’Cl-S  service  discipline  in  favor  t'f  the  smallest  num¬ 
bered  processor. 
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I  el  die  interval  between  the  requests  of  processor  /  and  processor  (/  /  1)  mm*/  N  he 
denoted  by  A/#(m)  where 

/,(« ) 

where  again  we  interpret  A/,(«)  as  A/,  /y(«  Hr/A/I).  l-roni  equation  2.1.1  we 

have 

A /,(«  v  1)  A//(«)  /  h’(  )“**’»(«).  (2.1.2) 

I  localise  of  the  detenninistic  processing  and  access  times,  requests  remain  in  their  initial 
ordering  for  all  «>1;  tlius  die  l;CTS  queue  enforces 

*v,  +  | (m  )  iihix (0,  w,(« )  t  l„  A/,(m )).  (2.1.3) 

With  equations  2.1.2  and  2.1.3  we  obtain 

la  'r  wi  +  i(m  )>0 

A 'i(n  H)i'  •.-,(«)*  A /,(«)  if  w, , ,(/» )--0 

Hut  if  wH  i(m)~0.  then  -h'/(m)v  A/,(m )>/,,..  Thus  A/,(«  /  1)>/U.  or  more  specifi¬ 
cally 

A/, («)>/„.  />0.  m>2  (2.1.4) 

i.c.  after  the  first  cycle  of  every  processor,  the  arrival  of  successive  requests  must  occur 
at  intervals  of  at  least  the  access  time  l„.  Equations  2.1.3  and  2.1.4  imply  that 
Wjr\(n)<.w,(n)  for  /t>2,  with  equality  if  and  only  if  iv,(m)  -0  or  A  /,•(«)  -  ia  for  m>2 
(or  both). 

Therefore  if  h,(//)-0  for  any  />0  and  any  m>2.  then  *v,(m )--0  for  every  i> 0  and 
every  m  past  that  point,  and  if  w,(m)>0  then  A/, _ j(«  +  !)-/„,  implying  that 

M',(//  +  1)  tv,  _  |(M  h  1). 

Now  either  ^,(2)  0  for  some  />(),  in  which  case  »,(//)- 0  for  all  />0  and  m>3.  or 
w,(2)>0  lor  all  />0,  in  which  ease  A  I,  |(3)  l„  and  w((3)-  w,  _  |(3)  for  all  />0  which 
in  turn  implies  that  A/,(m )  /„  and  *v,(n )  *v/y  ((2)  (by  equations  2.1.2  and  2.1.3 
respectively)  for  all  />0  and  m>3.  Therefore  w,(ii  )-  (  for  all  />0  and  //>3  where 

cooroo. 

Ilius  the  waiting  time  per  request  is  the  same  for  every  request  for  n> 3.  Moreover, 
the  waiting  time  per  request  for  m>3  is  eiliicr  always  zero  or  always  strictly  positive, 
implying  that  the  I  Cl  'S  queue  is  either  always  empty  or  always  nonempty  respectively 
at  the  instant  a  request  arrives  at  the  queue. 
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Kxistencc  and  Krgotlicily  Assumptions  of  Section  2.3 

1.  We  assume  that  a  stationary  probability  distribution  exists  for  iw. 

2.  We  assume  that  the  wailing  time  process  is  ergodic. 

3.  We  assume  that  the  time  averages  necessary  for  any  application  of  Lillie's  Law  to  the  queueing 
system  described  in  section  2.1  exist. 

Some  Conditions  Guaranteeing  the  Validity  of  these  Assumptions 

We  first  describe  die  basic  G/G/1//N  queueing  system  as  a  Markov  process  and  then  con¬ 
sider  some  conditions  on  this  Markov  process  to  show  the  validity  of  the  above  assumptions.  We 
assume  throughout  that: 

1)  the  processing  lime.  lp,  at  each  processor  is  a  random  variable  with  a  stationary  distribution 
and  i:\lp\<oo 

2)  the  access  time,  /„.  is  a  random  variable  with  a  stationary  distribution  and  /  |/(i]<°o 

3)  the  processing  time  random  variables  (one  for  each  processor)  and  the  access  time  random 
variable  are  mutually  independent. 

'/i 

<l) 

Let  </„  be  a  N  1  X  1  column  vector  i.c.  1f„  -  whose  elements  indicate  die  time 

‘7/V  - 1 

that  a  request  enters  the  queue  (i.c.  die  time  that  a  processor  makes  a  request)  relative  to  the  lime 
at  which  the  request  presently  in  service  began  its  service.  I  ct  the  elements  of  an  be  ordered  so 

dial  </i  <^/2<  •  •  •  <<//v  i-  Consider  the  vector  ~  where  •»„  is  the  waiting  time  of  the  n,h 

9/1 

request  to  atrivc  at  the  queue.  We  dicn  have  ,  i  -  f(?„.i„n-iPii)  where  /„  is  the  service  time 
(access  time)  of  the  h"  request  to  arrive  at  die  queue.  tp  is  the  processing  time  of  the  n " 
request,  and  /( •  )  is  a  deterministic  operation  on  its  arguments  defined  as  below. 

The  operation  /( • ): 

1.  Insert  itl  /  i„  in  die  ordered  list  defined  by  7fi;  so  dial  the  elements  remain  in  nondecreasing 

n  •  n 

order  with  respect  to  the  element  indices.  Hie  list  now  contains  N  elements  q'\.q'2.  •  •  •  ,q' s 
obeying  ^'|  <«/%<•••  The  inset  led  request  represents  the  time  at  which  the  next 

request  from  the  processor  wlios-  request  is  pre*enlly  in  service  again  enters  the  queue,  rela¬ 
tive  to  llie  time  at  which  its  present  rcquvi  began  service. 


TTT 


292  Appendix  B 

2.  tv„  f  |  »/»«((),  w,,  +  i„  q |),  Uic  waiting  time  of  the  next  request  to  arrive  at  die  queue. 

3.  Subtr.ici  q'\  from  all  N  entries  in  the  ordered  list  q'\.q'i.  ■  ■  •  .q' s-  Discard  the  zero  value  i.e. 

q{„  *  i),  -  4  i  r  i  q'\-  1  where  ,  l)(  is  die  ilh  element  of  + 1.  Ibis  subtraction 

updates  the  request  arrival  times  relative  to  Uic  time  at  which  Uic  next  -  i.e.  Uic  n  +  \,h  - 

request  began  service. 

Ilic  sequence  {7„ }.  «>l.  with  some  initial  probability  distribution  /V(?o<y  )*  describes  a 
discrete  time  continuous  stale  Markov  process  with  stationary  transition  probabilities  (since  /( •  ) 
is  deterministic  and  t,,*  and  ip  arc  stationary  random  variables). 

l  et  piv\?,A)  denote  the  v  step  transition  probabilities  i.e.  p(v\7.A)  /V(7  in  set  AC.RN 
alter  v  transitions).  If  we  define  /;(if,d)  and  p(A)  Pi (?o€ A)  then  we  have 

p1' 7  "(V./t )-  /  p(v\t\.A  )/>(?a/lf)  for  v>l  and 

p(A).  n  - 1 

/  pin  n(v.A)p(</Tf).  n  >  I 

If  /’re?,,  £  I )  is  independent  of  it  Uien  the  process  described  by  {?„)  is  strictly  stationary  and 
/)(  ■  )  is  called  a  stationary  probability  distribution. 

We  deline  the  sequence  {7„ }  to  be  periodic  with  period  M  if  ?„  =xf„  ,  \t  for  i;>in  for 
some  integer  »/t>0  and  some  M  <oo. 

We  arc  now  prepared  to  consider  some  conditions  guaranteeing  the  validity  of  the  I  Existence 
and  Hrgodicity  Assumptions. 

Case  0:  tw  0  for  every  n > nt  for  some  w>0.  In  tins  trivial  case  lim  l’r(in  <y)  certainly 

n  f\  -*  00  * 

\ 

exists  and  lim  JV  iw  -  0.  Thus  a  stationary  probability  distribution  exists  for  and 

n-*oo  n  .  ' 

t  —  m 

is  crgodic.  The  application  of  l  ittle's  I  aw  in  this  ease  is  just  an  academic  exercise  since 
Uic  majority  of  useful  information  has  already  been  conveyed  by  the  fact  that  /„.  0  for 

n  > m .  We  note  that  if  tw*  0  for  /»>///.  then  Uie  /V  processor  system  is  really  ,V  indepen¬ 
dent  subsystems. 

t  I  lore  7  means  less  than  or  equal  element-wise 


Pr&„€.‘ O- 


i  ViTt  »*»  r*  v  r»  »**  4*.  iv.lS  IW  .  |ViVv 
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Since  each  of  these  simple  subsystems  is  closed,  the  time  averages  must  be  finite,  f  urther¬ 
more,  due  to  the  extremely  simple  structure  and  die  stationarity  of  the  probability  distribu¬ 
tions,  the  time  averages  cannot  fail  to  exist  due  to  periodicities.  Therefore,  die  time  averages 
for  each  subsystem  must  exist.  And  since  all  the  subsystems  arc  independent,  we  conclude 
that  die  time  averages  for  the  entire  system  must  exist. 

We  assume  in  the  following  that  we  do  not  have  lw  -0  for  every  n>m  for  some 

Case  I:  I  he  Markov  process  { ?„ }  satisfies  Hypothesis  I)  of  l)oob  |p.l92  of  Ref.  I)l|  which 
roughly  staled  is  llie  following: 

Hypothesis  l> 

There  is  a  probability  assignment  of  sets  ACRN ,  an  integer  v>l,  and  a  positive  c,  such  that 
p{r)(?,A)<\  t  if  !*r(A  )<e. 


(A  more  precise  statement  in  terms  of  llorcl  sets  and  measures  is  given  by  Doob).  'litis 
hypothesis  basically  says  lltat  if  Pr(A)  is  small  then  p*v\xf.A)  is  uniformly  bounded  away 
from  I.  In  particular  this  means  that  {?„}  cannot  be  periodic  since  then  )~  1  for 

all  v>l  and  m>n.  If  a  density  function  Pol?.!))  exists  (i.c.  /*o(3ivjj)>0.  //»<j(.xMf)«/'if-- 1. 

Rn 

and  p(7?,A)-  J  p )  and  is  bounded,  then  Hypothesis  I)  is  satisfied  [Ref.  1)1,  p.  193). 

A 

'litis  condition  is  somewhat  stronger  than  Hypothesis  I)  and  excludes  impulses  in  prfjf.rf) 
(i.c.  discontinuities  in  p(!t,A)).  Hypothesis  I)  does  not  exclude  discontinuities  in  />(3f,/l)  as 
long  as  /;(]?,/!  )<1  for  all  7.  and  all  A  for  which  l'r(A )  is  small. 


Now  since  we  occasionally  have  lw  for  n>m  for  some  m  (by  assumption),  all  N  subsystems 


must  communicate,  hence  {?„}  consists  of  a  single  communicating  class  (or  crgodic  set,  as 
l)oob  calls  it).  Doobs  Theorem  5.7  (Ref.  1)1,  p.214J  then  asserts  that  under  Hypothesis  I) 
there  exists  a  unique  stationary  probability  distribution  for  !?„  independent  of  7/o.  This 
implies  that  a  stationary  probability  distribution  exists  for  lw  i.c.  lim  /V(/„  <v)  exists. 

n  -*00  * 

f  urthermore,  Doob's  Theorem  2.1  [Ref.  1)1,  p.4ft5]  (see  also  Theorem  ft. I  and  its  proof  on 


p.219)  asserts  that  tw 


lim  - 

-I-*®  H 


;  -  1 


All  the  time  averages  necessary  for  any  application  of  l  ittle's  l  aw  to  die  queueing  system 
described  by  the  Markov  process  {?„}  can  be  derived  from  this  Markov  process.  Any  partic¬ 
ular  time  average  of  interest  can  be  expressed  as  the  time  average  of  some  random  variable 
which  is  a  deterministic  function  of  ltl  ,  and  iP  .  l  or  example,  if  a„  leprescnts  the 

n  <  n 


t.  >»■  t>.  4* 
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interval  between  the  arrival  of  the  n,h  and  the  n  /  \'h  request  to  be  served  in  the 
G/G/1//N  system,  then  the  lime  average  reciprocal  of  the  arrival  r  e  (if  it  exists)  is 

l 

liin  -  2%,  where  <;„  minUh  Ju  '  h  )•  As  another  example,  if  n„  leprcscnts  the  number 

of  requests  in  the  G/G/I//N  queue  waiting  for  service  when  the  iii>>  begins  service,  then 

the  time  average  queue  length  (if  it  exists)  is  lint  '  ^ n,  t„  where  n„  is  computed  in  a 

n~*  oo  n  #.  ' 

straightforward  manner  from  7f„  .*  Since  the  Markov  process  {7„ }  has  a  unique  stationary 
probability  distribution  and  l„  and  ip  have  stationary  probability  distributions,  any  random 
variable  which  is  a  deterministic  function  of  these  quantities  will  also  have  a  stationary  pro¬ 
bability  distribution.  Doob’s  Theorem  2.1  [Ref  1)1.  p.4t>5)  then  implies  that  the  time  average 
of  such  a  random  variable  exists  (since  it  must  equal  its  mean,  which  exists  since  a  stationary 
probability  distribution  exists). 

Case  2:  The  Markov  process  {7„  }  is  periodic. 

I.emnia  II. I 

The  Markov  process  {  V„  }  is  periodic  if  and  only  if  ia  and  /„  arc  deterministic  random  vari- 

ft  •  n 

ables  -  i.c.  constants  for  all  n  >0. 

♦ 

Proof: 

The  "if”  part:  If  /„  and  /„  arc  deterministic  random  variables,  then  by  Theorem 
2.1  /*,  is  a  constant  for  //  >  3/V  where  N  is  die  number  of  processors.  Now  lw^ 

/„  ,  and  tp  constants  for  n>}N  implies  that  \7„\  is  periodic  w  ith  period  at  most 

N. 

The  "only  if’  part:  Suppose  that  {?„  }  was  periodic  and  / „  and  tPn  were  not  both 
deterministic  random  variables.  Consider  some  state  7„  of  the  periodic  portion  of 
the  sequence  {V,, }.  Then  the  next  slate  depends  on  and  tp  .  Hut  because  { V„ ) 
is  periodic,  this  next  suite,  ,  \ .  is  already  known  with  probability  1.  Since  t a 
and  lp  arc  not  both  deterministic  random  variables  (and  since  both  arc  stationary 
random  variables),  there  is  some  positive  probability  of  the  sum  1,,^  *■  ip  being 
such  that  some  element  in  the  ordered  list  obtained  via  the  /(  •  )  operation  is 

Note  that  these  lime  averages  aie  ufa-renl  from  but  equivalent  lo  those  in  the  statement  of  I  lilies  law  in 
seel  ion  2  3  if  ihcy  exist. 
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different  from  that  in  the  known  next  state.  Ihis  contradicts  the  hypothesis  that 
{  ?„ }  is  periodic. 


Corollary  It.  I 

If  the  Markov  process  {.?„ }  is  periodic,  then 


1.  lw  is  a  constant  for  n>m  for  m> 0  large  enough  and  thus  liin  l'r(tw  <y)  exists 
"  n-*oo  * 

i.c.  a  shitionary  probability  distribution  exists  for  /„,  . 


2.7„=  lim 

w  n-oo  n 


3.  die  time  averages  nccessiiry  for  any  application  of  l  ittle’s  l  aw  to  tlie  queueing  sys¬ 
tem  described  by  {?„ }  exist. 


Proof: 

Points  1  and  2  follow  immediately  from  l  emma  11.1  and  Theorem  2.1.  Since  the 
Markov  process  {:?„}  is  periodic,  any  lime  average  derived  from  {?„ }  is  equal  to 
the  same  average  over  one  period  of  {jf,,}.  Since  by  hypothesis  the  period  of 
is  finite,  all  possible  averages  derived  from  { |  must  exist  and  hence  all 
possible  time  averages  derived  from  {?„  J  must  also  exist.  Point  3  now  follows 
since  the  set  of  time  averages  necessary  for  any  application  of  Little's  Law  to  the 
queueing  system  described  by  {y„ }  is  a  subset  of  all  possible  time  averages 
derived  from  {?„ }. 


Remark: 

The  above  three  eases  arc  not  necessarily  exhaustive. 


2% 
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Theorem  2.2 

Consider  (lie  queueing  model  described  in  section  2.1  with  stationary  processing  and  access 
time  distributions  with  means  ^,<00  and  /,(<o o  respectively  and  subject  to  the  assumptions  in 
section  23.  Then  w(N  f  I)  w(\ ’)<!„  where  w(N)  denotes  the  mean  waiting  time  in  a  N  pro¬ 
cessor  model. 

Proof: 

The  N  processor  model  is  a  G/G/I//N  queueing  system  as  described  in  section  2.1. 

I. cl  the  processing  and  access  time  distributions  of  tins  G/G/I//N  system  be  denoted 
by  /'.(.it)  and  (v)  icspeclively.  Iltc  N  *  1  processor  model  is  a  G/G/1//N-I-1 

P  “ 

queueing  system  with  the  same  processing  and  access  time  distributions  -  /■',  (jr)  and 

p 

I)  ( v )  respectively  •  as  for  the  G/G/1//N  system.  In  the  remainder  of  the  proof  the 
G/G/1//N  system  and  the  G/G/1//N+  1  system  are  referred  to  as  the  G/G/l/zN/P 
system  and  the  G/G/I//N+  I/P  system  respectively  to  emphasize  the  special  relation¬ 
ship  between  the  two  systems.  The  additional  P  denotes  "pair". 

I.ct  die  state  of  the  G/G/1//N  system  at  time  /  be  described  by: 

XU.N)  (n,x.tp^p2 . //)jv) 

where  N  denotes  die  number  of  processors,  //  indicate^.  the  number  of  requests 
queued  for  service  and  presently  in  service,  ,v  is  the  residual  access  time,  and  ip , 

1<(  </V,  is  the  residua!  processing  time  at  processor  /. 

It  is  not  necessary  to  include  in  the  state  description  when  processor  i  is  not  pro- 

’  t 

cessing  -  i.c.  when  processor  /  is  waiting  for  a  request  to  complete;  indeed,  lp  has  no 
meaning  in  diis  case.  However,  we  choose  to  include  lp  in  the  state  for  this  type  of 
situation  for  nolational  convenience  (i.c.  so  we  can  always  write  X(/..V)  in  the  same 
way  independent  of  which  processors  are  processing).  To  ensure  that  ip  is  always  well 
defined,  we  let  rp  0  when  processor  /  is  not  processing.  The  analogous  situation 
occurs  with  .v  when  there  are  no  outstanding  requests.  In  the  following  wc  refer  to  the 
yV-tuple  ip  ip,  ■  •  •  .tpN  by  the  vector  ip. 

Similarly,  let  the  state  of  the  G/G/1//N  +  1/P  system  at  time  /’  be  described  by 
X(I,N  +  \)-[n,x,lp  •  jp^jp^  ).  flic  interpretation  of  each  quantity  in  this  state 

description  is  the  same  as  for  the  G/G/l/N/P  system. 
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The  proof  is  based  on  an  argument  that  the  behaviour  of  a  G/G/I//N/P  system  with 
n  requests  queued  for  and  in  service  and  the  behaviour  of  a  G/G/1//N  +  1/P  system 
n  -f  1  requests  queued  for  and  in  service  are  probabilistically  identical.  Ibe  details  of 
the  argument  arc  as  follows. 

Consider  a  slate  X(t  ,N)--(n  ,xjp).  lor  some  /'  and  «>1.  there  exists  a  state 
X(l',N  -h !)-(//  +  \.x\i^')  where  and  pNt])*  //>  -0.  since  die 

G/G/1//N+I  processor  system  is  identical  to  die  G/G/1//N  system  aside  from  an 
extra  processor.  In  fact,  for  each  //>1  and  for  each  suite  X(/  ,/V )  (n,x,ip),  there 
exists  a  corresponding  state  X(/',/V  /!)-(«  /  l.jr 'jp')  for  some  /'  with  x'  x  and 


(/„./,,  ),  tn 

P  Pn  t\  Ps  1 1 


0. 


The  suite  Xit'.N  H)  (//  t  1  ,.v ,(£.()))  differs  from  the  suite  X(t,N)  (n.x.tj,)  (aside 
from  the  possible  difference  in  times  /  and  /')  only  in  that  there  is  one  more  request  in 
die  queue.  Ihil  for  «>l.  this  additional  request  in  the  queue  cannot  lie  receiving  ser¬ 
vice  (without  loss  of  generality  we  can  consider  die  additional  request  to  be  the  last 
request  in  the  queue),  Furthermore,  the  processing  times  at  each  processor  and  the 
access  time  of  each  request  are  independent  of  each  other  and  everything  else  in  the 
system  including  the  additional  request  in  the  queue.  Therefore,  for  «>1,  the  system 
operation  cannot  depend  on  the  fact  that  there  is  an  additional  request  in  the  queue. 
Thus,  for  »>l,  the  probabilistic  behaviour  in  states  X(l.N)  (n  ,xjp)  and 
X(t',N  /-  I)  -(//  /  l„v.(//t  ,0))  must  be  identical.  Since  given  any  state  X(/ ,/V)  (/; ,x ,tp) 
with  «>l,  there  exists  some  state  X(t',N  t  +  \.x,(tp,0))  and  since  these  two 

states  must  have  die  same  probabilistic  behaviour,  the  G/G/I/^N/P  and  the 
G/G/1//N-F  1/P  systems  have  die  same  probabilistic  behaviour,  as  long  as  «>l.  In 
particular,  if  one  request  in  the  queue  of  die  G/G/I//N-I-  1/P  system  was  hidden  from 
view,  an  observer  would  be  unable  to  distinguish  die  G/G/1//N/P  queueing  system 
from  the  G/G/I//N  -f  1/P  system  so  long  as  n  >  I. 

We  now  introduce  some  notation,  l  et  denote  the  fraction  of  time  that  die 
G/G/1//N/P  system  has  i  requests  in  the  queue  and  in  service.  I  cl  denote  the 
lime  average  of  die  number  of  requests  queued  for  but  not  in  service  in  the 
G/G/I//N/P  system.  Finally,  let  p ^  denote  die  fraction  of  time  that  the  server  is 
busy.  We  have 


2</-i)*/v 

/  -  2 


N 


I  cl  v  I 
teni. 


A’  '  I 


V  ,  I 


ill J  p 


nN  -  V„'v 
P  -2ml  W  I 

I  I  -  I 

,v  '  1  be  die  analogous  quantities  for  the  G/G/I//N  t  l/Psys- 


M 

. 

m 

2$ 


/•  ■.'"1 
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Lemma  B.2 


*£t  W^fbr  />! 


Proof: 


As  argued  earlier,  the  G/G/1//N/P  system  with  /  requests  in  the  queue  is  probabilisti¬ 
cally  identical  to  (he  G/G/1/N+  1/P  system  with  i+  1  requests  in  the  queue  and  in 
service  if  />!.  It  follows  that  ir,>t ,  =  for  some  constant  C. 


Proceeding  with  the  proof  of  ITieorcm  2.2,  there  arc  two  cases  to  consider. 


Case  1:  w(N)~  0 


I  -crnma  B.3 


If  w(N)  =  0  then  w/^*'=0  for  i>2. 


Proof: 


If  w(A)=0,  then  -nf*  -  0  for  />!.  It  follows  from  Lemma  11.2  that  »/v  +  ,-0  for  />  2. 


Ihus  in  case  l  nq y ir^*1  =pNi  1  -  v**  X<pN  *  Prom  Little’s  law  we  have 


TiN  +  '  1 
"q  'a 


w(N  +  1) - — jy—i — <ta.  Ilicrcforc  w(N  -i 1)~ w(N)<ia. 

P 


Case  2:  »v(/V)>0 


If  w(A/)>0,  then  w/^X)  for  some  />1.  By  Little’s  I  .aw  we  have  w{N)  =  — — and 

P 


nqN  +  l  7a 
*(N  +  D-^yvTT- 


/VA1 

and  11 1  -  + 1  +  Combining  these  two  relations  with  Lemma  B.2 


yields: 


r,N  +  '--rr!N4-»N  +  x-*F 
Hq  -  t  Hq  +P  *1 


Therefore 


-  »?,}  »!  -  »!/((■  pn  -,**'> 
w(N +  \)~  w[N)~l„  i.  ;-|  -  x,  -i„  1/  '  j/Vf  iv'  _A'  Vi 

P  P  P  P  P 


-  «  *  •  •  .  '  »  •  V*  4  •  m  V*  ".V  ►  *  *  '  ‘  ^  O  O  »  •  «  •  O 
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Applying  J  emma  IJ.2  again  yields 
finally, 

(  hNwN  +  l 

iv(A/  +  I)-- w(N)=7u  |l  nTTW 

1  P  P 
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Theorem  14 

Ixrt  l'*ab(s)  denote  the  Kapkicc  transform  of  die  probability  density  function  fab(x) 
with  mean  x  where  J'ab(x)~ 0  for  x  0 <a  and  b  <2a.  I.ct  /-’*(s )m/ma//N 

denote  the  I.aplacc  transform  of  the  exponential  density  function  with  the  same  mean 
x.  Then  y‘(s)M/M/\//N>l'0 ab(s)  for  s  real  and  t>0. 


Proof 


1  . 

We  arc  to  show  that  I‘'*U)m/m/\//n  ~  — - >/•’" ab(s)~  I e~u fab(x)dx  f°r  s  real 

sx  + 1  a 

b 

and  s>0  where  x  =  J xf^(x)dx .  This  is  equivalent  to  showing  that  e<l  where 
a 
b 

e=(l*sjf)JV  '** fab(x)dx.  We  note  that  e  and  all  its  derivatives  with  respect  to  s 

a 

exist  and  arc  continuous  in  s.  In  addition  we  note  that  t  -  1  at  s  =0.  It  thus  suffices  to 

show  that  —  <0  for  s>0. 

3s  “  ~ 

a  b  b 

~  ~xf  e~sxfab(x)dx  -(l  +  sl)j  xe'^fablxWx 
os 


~\  ~0 
01  1=0 


a2* 

3s2 


b  b 

-=  ~ 2xf  xe~*xf.jb(x)dx  +(l  +  sx)f  x2e~ufob(x)dx 
a  a 


b 


Now  -~y<(-2jf+(l  isx)b)]xe  sxfab(x)dx.  Since  / xe  sx/ab(x)dx>0  and 

a  a 

2jjT _ ^ 

-2x  /-(l  +sx)b  <0  for  s  real  and  s< - .  we  must  have 

bx 

d2e  ^  2x  -h 

— ,<0  for  0<s< - , 

3s2  - bx 

implying  that  ~ ■  ^0  for  0<s  <— — 
ds  ~  bx 

dt  h 

Furthermore,  — ^-<(jr  -(l  t-sx)a)fe  **fab(x)<ix. 
ds  a 
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we  must  have  ~  <0  for  s>-~ — — . 

os  ax 

But  0<b  <2 a  implies  Jr b  -ab  <2ax  -ab  which  implies 


'ITicrcforc  -—<0  for  s>0. 


/«„  *S,  »<  jV.. 
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Theorem  3.1 
Preamble: 

Consider  the  following  two  suite  descriptions  of  a  Ringbus  with  .S’  slices,  request  probabilities 
ph  i  =  -(S/2-  1).  •  •  • ,  - 1,  0,  1,  •  •• .  .S'/ 2,  and  subject  to  the  assumptions  in  chapter  3: 

State  description  A:  (r\,d\\riJi,  •  •  -  \rs,ds) 

Suite  description  B:  (r\,w\,d\\ri,wi,d-);,  ‘  * "  'rs<wsds) 
r,  and  d,  denote  the  request  at  slice  i  and  the  dunition  of  the  grant  at  slice  /,  as  discussed  in  sec¬ 
tion  3.2.1.  w,  denotes  the  interval  for  which  the  request  at  slice  i  has  waited  so  far  without  being 
granted.  We  adopt  the  convention  that  w,  -0  whenever  dj*0.  The  arbitration  problem  relative  to 
suite  description  A  is  to  find  a  policy  1)^  which  maximizes  the  throughput  g*A  given  by 


(rjt) 


where 


(r,d)  denotes  a  particular  state  (using  vector  notation), 

d(r,d)  is  the  decision  in  suite  (r,d), 

qfy/^  is  the  reward  in  suite  (r,d)  under  decision  d(r,d), 

P(rrj‘llr’M')  is  1110  one  step  transition  probability  from  state  (r,d)  to  suite  (r ',</')  under  deci¬ 
sion  d(r,d),  and 

is  the  steady-suite  probability  of  being  in  state  (r,d)  under  policy  i)^. 

'ITic  arbitration  problem  relative  to  suite  description  B  is  to  find  a  policy  1)^  which  maximizes  the 
throughput  given  by 

s"'=  2  it!’.-/' 

(r.wd) 


where 


(r,w,d)  denotes  a  particular  suite  (using  vector  noUition), 

d(r,w,d)  is  the  decision  in  suite  (r.w,d), 

dir'w^dV  is  the  reward  in  suite  (r,w,d)  under  decision  d(r,w.d), 

p(r!»j$r'.w'.d')  is  thc  onc  stcP  transition  probability  from  suite  (r,w,d)  to  suite  (r\w',d') 
under  decision  d(r,w,d),  and 

•trill  j >  is  thc  steady-state  probability  of  being  in  state  ( r,w,d )  under  policy  l)/j. 


S'4' 
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Note  dial  if  (r'.w'.d')  is  the  immediate  successor  of  some  suite  ( r.w.d )  then  iv'(  0  if  request 
r,  was  granted  and  «*',•  ~w,  /  1  otherwise.  Ihus  l>frrw*,lx\'  .w' j')  0  if  for  any  i  cither  a)  request  r, 
is  granted  and  iv',^0.  or  b)  request  r,  is  not  granted  and  w *  w,  +  1.  This  can  be  expressed  more 
concisely  as  0  if  (r',w ',</')  (f  IV((r, w.d))  where  (r',w'.</')G  W((r .w.d))  if  for  each 

i=  -(S'/ 2-  1),  •  ■  • ,  -1,  0,  1,  •  •  ■  .  S/2  either  tv/'-O  if  request  r,  is  granted  or  iv/  -  wt  t  1  if 
request  r,  is  not  granted. 

Statement: 

Let  *>r  be  any  optimum  policy  for  the  arbitration  problem  relative  to  state  description  A. 
Then,  if  there  is  no  upper  bound  constraint  on  the  waiting  times  wh  an  optimum  slationaiy  policy 
I tff'  for  the  arbitration  problem  relative  to  suite  description  it  is  the  following  policy  l>#: 

Choose  d*(r.w.d)  in  each  state  ( r.w.d )  such  that  d’ (r.w.d)  -  dopl (r.d)  for  all  iv. 
Consequently 

\vfr for  all  tv  and  all  h  ' such  that(r'.H''.i/')C 
_</*('•  K.(/)  ,  _  j 

P{r.wd\ir  w.j)  -  |q  otherwise 

and 

-qC\,  J)  for  all  w. 

\yy ‘  |  )<** 

furthermore,  g  ~g  .  Thus  die  wailing  time  inhumation  w  is  irrelevant  in  determining  die 
optimum  throughput. 

Proof: 

nr 

For  the  arbitration  problem  relative  to  suite  description  A,  let  denote  the  value  of 

being  in  suite  (r.d )  under  policy  l)^1'  and  let  vj  4  denote  the  value  of  being  in  state  1 —(0.0). 
Then  from  equation  3.6  we  have 


l)f 

S 


nr 

'  v(rJ) 


nr 


J,:*y.d) 


d(r.J 


{r.d) 


I 

»1 


) 


For  die  arbitration  problem  relative  to  state  description  It.  let  l\r  *,</)(«)  denote  the  optimal 
expected  total  reward  accumulated  over  //  rounds  if  the  process  started  in  suite  (/  .»  ,</)  with  termi¬ 
nal  reward  Then  from  equation  3.16  we  have 


V(r.»,dp'  h  I) =max( 

d(r.w.d) 


d{r .n  d) 
d(r.n.d) 


4  2  Posh*/ Mr' »  ;/•((»)) 

(f'.M  J) 


304 


Appendix  B 


-- nUIX l  +  2 


»tax(  t  q<tirT.«W/))  *  2  P(r(r7$r\w\J')V{r\w‘.J-pt ))! 

d(r,w.d)*J  (r.w,d)  (/>'.<*') 


Substituting  in  for  p(r*  j$ J')  an<l  <iir!w7)d)'  wc  have 


+  \)=-»iax[  qfW  +  2  pfTdSx^d) Vir  Vj't* )• 


d(f.w,d)*d  (r.w.d)  (r’.w'.d') 


HT  l)T 

l.ct  llic  terminal  rewards  be  V(r.wjt j(0)-  V(r.d)  ~vl  for  every  (r,w,d).  'Ilicn 

V(r.wji\)  =  »™x[  dir'dV  +  2  P(7d^Wv(r^T) 


t  rj‘) 


max(  .  2  ^  >(v t%  >  -  v,,>r ))] 

d(r.w,d)*d  (r.w.d)  (r.w.d) 


Now  every  decision  d(r.w.d)  in  state  {r.w.d )  is  the  same  as  some  decision  d(r.d)  in  state  (r.d). 
Thus  p(rr7d\r'  w\d)  =Pu 'Air’d)  and  >l(r'7j))  -</<?'/’  for  ^mc  ^4)  r»r  every  d (r.w.d)  for 
every  state  (r.w.d).  Since  1)^  is  an  optimal  policy. 


«£3"°  >■  2  pOti's, »SZ>  -!>r)  >  4‘i?1 '  2  di’A  ^lZ,  -  »rr) 

(r '.</•)  (rV> 


for  every  d(r.d)  and  thus 


-a"'1*  2  d^W>'X,  - -!>r)  >  *  2  v,iv.%v-'!>r) 

(r'J')  (r  .w  ,d  )£W ((r.w.d)) 


I).  (IT  l)T  , 

for  every  d(r.w,d)  and  every  state  (r.w.d).  I'hcrcforc  l/(r  H.^//l)  g  ^  v[r  d)  vl  f°r  CVCfy 

lymt  I)* 

state  (r.w.d).  ITuis  the  policy  D#  is  an  optimal  stationary  policy.  It  follows  that  g  ~g  *. 


Under  policy  I)#.  qfr'JdV  ~<if7d\rd)  for  all  V  and  2 "(r!'w.df~wLrj)-  !^us 


(r.d) 


,  i >r  ds* 

Ihcrcforc  g  =g  . 
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